Importing Libraries

In [1]:
import collections
import re

import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
In [2]:
import nltk
from wordcloud import WordCloud , ImageColorGenerator
from PIL import Image
In [3]:
from plotly.offline import plot
import plotly.express as px

from sklearn.preprocessing import LabelEncoder, StandardScaler
from sklearn.model_selection import train_test_split

Loading Dataset

The data has been acquired from Kaggle open datasets. It is a raw dataset named “indian_food”, which represents entirety of the testing data for August 5th to October 5th, 2020.

Independent Variables- Actual measurement parameters of an Indian Dish which are name, ingredients, prep_time, cook_time, flavor_profile, course, state, region.

Dependent Variable- Classification of dish as part of the diet(vegetarian and non-vegetarian).

In [4]:
df = pd.read_csv('indian_food.csv')
df.head()
Out[4]:
name ingredients diet prep_time cook_time flavor_profile course state region
0 Balu shahi Maida flour, yogurt, oil, sugar vegetarian 45 25 sweet dessert West Bengal East
1 Boondi Gram flour, ghee, sugar vegetarian 80 30 sweet dessert Rajasthan West
2 Gajar ka halwa Carrots, milk, sugar, ghee, cashews, raisins vegetarian 15 60 sweet dessert Punjab North
3 Ghevar Flour, ghee, kewra, milk, clarified butter, su... vegetarian 15 30 sweet dessert Rajasthan West
4 Gulab jamun Milk powder, plain flour, baking powder, ghee,... vegetarian 15 40 sweet dessert West Bengal East
In [5]:
df.shape
Out[5]:
(255, 9)

Exploratory Data Analysis

In [6]:
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 255 entries, 0 to 254
Data columns (total 9 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   name            255 non-null    object
 1   ingredients     255 non-null    object
 2   diet            255 non-null    object
 3   prep_time       255 non-null    int64 
 4   cook_time       255 non-null    int64 
 5   flavor_profile  255 non-null    object
 6   course          255 non-null    object
 7   state           255 non-null    object
 8   region          254 non-null    object
dtypes: int64(2), object(7)
memory usage: 18.1+ KB

It can be seen that only cook_time and prep_time are numeric continious variables. Others are categorical variables which needs to be one-hot encoded before model building.

In [7]:
df.isnull().sum()
Out[7]:
name              0
ingredients       0
diet              0
prep_time         0
cook_time         0
flavor_profile    0
course            0
state             0
region            1
dtype: int64
In [8]:
fig_flavorprofile = sns.countplot(data=df, x="flavor_profile", order = df['flavor_profile'].value_counts().index)
fig_flavorprofile.set_title("flavor_profile countplot")
Out[8]:
Text(0.5, 1.0, 'flavor_profile countplot')

The below diagram shows the distribution of dishes in terms of the flavor profile. There is an imbalance in the distribution as can be seen from the numbers. Imbalance - The clean dataset has an imbalance of 226:29 for Vegetarian: Non-Vegetarian. This is handled by SMOTE so that it does not lead to overfitting of vegetarian data points. The other solution for this is more data.

In [9]:
pie_chart = df.diet.value_counts().reset_index()
pie_chart.columns = ['diet','count']
fig = px.pie(pie_chart, values='count', names='diet', title='Vegetarian and Non-Vegetarian dishes Ratio')
fig.show()
In [10]:
ingredients = []
for i in range(len(df)):
    single_dish_ingredients =  df["ingredients"][i]
    ingredients = ingredients + [word.lower() for word in nltk.word_tokenize(single_dish_ingredients) if not word in ['.', ',']]
    
print(ingredients)
['maida', 'flour', 'yogurt', 'oil', 'sugar', 'gram', 'flour', 'ghee', 'sugar', 'carrots', 'milk', 'sugar', 'ghee', 'cashews', 'raisins', 'flour', 'ghee', 'kewra', 'milk', 'clarified', 'butter', 'sugar', 'almonds', 'pistachio', 'saffron', 'green', 'cardamom', 'milk', 'powder', 'plain', 'flour', 'baking', 'powder', 'ghee', 'milk', 'sugar', 'water', 'rose', 'water', 'sugar', 'syrup', 'lentil', 'flour', 'maida', 'corn', 'flour', 'baking', 'soda', 'vinegar', 'curd', 'water', 'turmeric', 'saffron', 'cardamom', 'cashews', 'ghee', 'cardamom', 'sugar', 'milk', 'cottage', 'cheese', 'sugar', 'milk', 'rice', 'sugar', 'dried', 'fruits', 'gram', 'flour', 'ghee', 'sugar', 'yogurt', 'milk', 'nuts', 'sugar', 'refined', 'flour', 'besan', 'ghee', 'powdered', 'sugar', 'yoghurt', 'green', 'cardamom', 'firm', 'white', 'pumpkin', 'sugar', 'kitchen', 'lime', 'alum', 'powder', 'rice', 'sugar', 'nuts', 'condensed', 'milk', 'sugar', 'spices', 'nuts', 'semolina', 'ghee', 'nuts', 'milk', 'khoa', 'coconut', 'molu', 'leaf', 'corn', 'flour', 'ghee', 'dry', 'fruits', 'gram', 'flour', 'ghee', 'sugar', 'milk', 'cardamom', 'chhena', 'sugar', 'ghee', 'chhena', 'sugar', 'milk', 'sugar', 'chenna', 'cheese', 'flour', 'cream', 'sugar', 'saffron', 'lemon', 'juice', 'coconut', 'flakes', 'chenna', 'condensed', 'milk', 'sugar', 'saffron', 'cardamom', 'chhena', 'sugar', 'ghee', 'flour', 'fried', 'milk', 'power', 'sugar', 'syrup', 'yoghurt', 'refined', 'flour', 'ghee', 'fennel', 'seeds', 'besan', 'flour', 'sugar', 'ghee', 'milk', 'jaggery', 'chhena', 'sugar', 'ghee', 'flour', 'rice', 'flour', 'wheat', 'flour', 'chenna', 'sweetened', 'milk', 'chhena', 'reduced', 'milk', 'pistachio', 'chhena', 'sugar', 'cardamom', 'milk', 'sugar', 'saffron', 'cardamom', 'rice', 'flour', 'jaggery', 'ghee', 'vegetable', 'oil', 'elachi', 'rice', 'flour', 'jaggery', 'ghee', 'besan', 'jaggery', 'cardamom', 'powder', 'ghee', 'cashews', 'and', 'raisins', 'jaggery', 'syrup', 'sugar', 'peanuts', 'jaggery', 'milk', 'sugar', 'dharwadi', 'buffalo', 'milk', 'loaf', 'bread', 'milk', 'rice', 'flour', 'sugar', 'salt', 'ghee', 'semolina', 'wheat', 'flour', 'sugar', 'black', 'lentils', 'rice', 'besan', 'flour', 'semolina', 'mung', 'bean', 'jaggery', 'coconut', 'skimmed', 'milk', 'powder', 'sugar', 'ghee', 'maida', 'flour', 'turmeric', 'coconut', 'chickpeas', 'jaggery', 'ghee', 'cardamom', 'rice', 'flour', 'milk', 'chana', 'dal', 'jaggery', 'rice', 'jaggery', 'cashews', 'ghee', 'rice', 'flour', 'powdered', 'sugar', 'ghee', 'apricots', 'sugar', 'syrup', 'vermicelli', 'pudding', 'milk', 'rice', 'flour', 'banana', 'jaggery', 'coconut', 'rice', 'flour', 'jaggery', 'coconut', 'rice', 'flour', 'jaggery', 'khus-khus', 'seeds', 'sugar', 'milk', 'nuts', 'cucumber', 'rava', 'milk', 'rice', 'sugar', 'dry', 'fruits', 'semolina', 'sugar', 'rice', 'flour', 'coconut', 'jaggery', 'sugar', 'ghee', 'maida', 'flour', 'semolina', 'curd', 'sugar', 'saffron', 'cardamom', 'maida', 'sugar', 'ghee', 'fish', 'potol', 'tomato', 'chillies', 'ginger', 'garlic', 'boiled', 'pork', 'onions', 'chillies', 'ginger', 'and', 'garlic', 'rice', 'milk', 'sugar', 'cardamom', 'rice', 'axone', 'salt', 'water', 'chillies', 'pork', 'cauliflower', 'potato', 'garam', 'masala', 'turmeric', 'curry', 'leaves', 'rice', 'flour', 'potato', 'bread', 'crumbs', 'garam', 'masala', 'salt', 'potato', 'peas', 'chillies', 'ginger', 'garam', 'masala', 'garlic', 'potato', 'fenugreek', 'leaves', 'chillies', 'salt', 'oil', 'potato', 'shimla', 'mirch', 'garam', 'masala', 'amchur', 'powder', 'salt', 'chole', 'rava', 'yogurt', 'plain', 'flour', 'baking', 'soda', 'ladies', 'finger', 'garam', 'masala', 'kasuri', 'methi', 'tomatoes', 'chili', 'powder', 'chicken', 'thighs', 'basmati', 'rice', 'star', 'anise', 'sweet', 'green', 'chillies', 'chicken', 'greek', 'yogurt', 'cream', 'garam', 'masala', 'powder', 'cashew', 'nuts', 'butter', 'chickpeas', 'tomato', 'paste', 'garam', 'masala', 'ginger', 'red', 'onion', 'avocado', 'oil', 'whole', 'wheat', 'flour', 'olive', 'oil', 'hot', 'water', 'all', 'purpose', 'flour', 'chicken', 'dahi', 'sesame', 'seeds', 'garam', 'masala', 'powder', 'cashew', 'nuts', 'saffron', 'naan', 'bread', 'tomato', 'sauce', 'skinless', 'chicken', 'breasts', 'heavy', 'cream', 'garam', 'masala', 'chicken', 'whole', 'wheat', 'bread', 'rice', 'flour', 'garam', 'masala', 'powder', 'whole', 'egg', 'chole', 'bhatura', 'garam', 'masala', 'bay', 'leaf', 'cinnamon', 'stick', 'moong', 'dal', 'masoor', 'dal', 'chana', 'dal', 'wheat', 'flour', 'almond', 'moong', 'dal', 'garam', 'masala', 'powder', 'garlic', 'green', 'chilli', 'all', 'purpose', 'flour', 'red', 'kidney', 'beans', 'urad', 'dal', 'cream', 'garam', 'masala', 'chili', 'powder', 'pigeon', 'peas', 'garam', 'masala', 'ginger', 'red', 'onion', 'kasuri', 'methi', 'baby', 'potatoes', 'garam', 'masala', 'cashew', 'nuts', 'kasuri', 'methi', 'tomatoes', 'beaten', 'rice', 'flakes', 'potato', 'curry', 'leaves', 'green', 'chilies', 'lemon', 'juice', 'chana', 'dal', 'whole', 'wheat', 'flour', 'arhar', 'dal', 'white', 'urad', 'dal', 'garam', 'masala', 'powder', 'moong', 'dal', 'rava', 'garam', 'masala', 'dough', 'fennel', 'seeds', 'cottage', 'cheese', 'bell', 'peppers', 'gravy', 'garam', 'masala', 'cashew', 'nuts', 'besan', 'garam', 'masala', 'powder', 'gram', 'flour', 'ginger', 'curry', 'leaves', 'bitter', 'gourd', 'fennel', 'garam', 'masala', 'powder', 'chili', 'powder', 'amchur', 'powder', 'moong', 'dal', 'green', 'peas', 'ginger', 'tomato', 'green', 'chili', 'paneer', 'potato', 'cream', 'corn', 'flour', 'garam', 'masala', 'rose', 'syrup', 'falooda', 'sev', 'mixed', 'nuts', 'saffron', 'sugar', 'bottle', 'gourd', 'garam', 'masala', 'powder', 'gram', 'flour', 'ginger', 'chillies', 'bottle', 'gourd', 'coconut', 'oil', 'garam', 'masala', 'ginger', 'green', 'chillies', 'wheat', 'flour', 'roasted', 'gram', 'flour', 'tomato', 'nigella', 'seeds', 'chilli', 'palak', 'makki', 'atta', 'mustard', 'green', 'garam', 'masala', 'ginger', 'whole', 'wheat', 'flour', 'chickpea', 'flour', 'green', 'chilies', 'mushroom', 'malai', 'garam', 'masala', 'ginger', 'capsicum', 'canned', 'coconut', 'milk', 'frozen', 'green', 'peas', 'wild', 'mushrooms', 'garam', 'masala', 'tomatoes', 'whole', 'wheat', 'flour', 'honey', 'butter', 'garlic', 'green', 'beans', 'potatoes', 'khus', 'khus', 'low', 'fat', 'garam', 'masala', 'powder', 'cottage', 'cheese', 'palak', 'cream', 'garam', 'masala', 'butter', 'paneer', 'whipping', 'cream', 'garam', 'masala', 'cashew', 'nuts', 'butter', 'paneer', 'greek', 'yogurt', 'tandoori', 'masala', 'cream', 'bell', 'pepper', 'kala', 'chana', 'mashed', 'potato', 'boondi', 'sev', 'lemon', 'whole', 'wheat', 'flour', 'musk', 'melon', 'seeds', 'poppy', 'seeds', 'edible', 'gum', 'semolina', 'urad', 'dal', 'sev', 'lemon', 'juice', 'chopped', 'tomatoes', 'wheat', 'flour', 'butter', 'potato', 'coriander', 'arbi', 'ke', 'patte', 'sesame', 'seeds', 'gur', 'bengal', 'gram', 'flour', 'imli', 'fennel', 'tea', 'bags', 'tomato', 'kasuri', 'methi', 'cinnamon', 'red', 'kidney', 'beans', 'garam', 'masala', 'powder', 'ginger', 'tomato', 'mustard', 'oil', 'garam', 'masala', 'powder', 'tomato', 'kasuri', 'methi', 'cinnamon', 'mustard', 'oil', 'potatoes', 'green', 'peas', 'garam', 'masala', 'ginger', 'dough', 'sattu', 'atta', 'dough', 'filling', 'mustard', 'oil', 'cottage', 'cheese', 'malai', 'garam', 'masala', 'ginger', 'tomato', 'rose', 'water', 'milk', 'white', 'bread', 'slices', 'saffron', 'almonds', 'baby', 'corn', 'french', 'beans', 'garam', 'masala', 'ginger', 'carrot', 'greek', 'yogurt', 'garam', 'masala', 'kasuri', 'methi', 'marinade', 'mustard', 'oil', 'chickpea', 'flour', 'biryani', 'masala', 'powder', 'yogurt', 'fish', 'fillets', 'green', 'bell', 'pepper', 'whole', 'wheat', 'flour', 'arhar', 'dal', 'ginger', 'kala', 'jeera', 'green', 'chilli', 'raw', 'banana', 'elephant', 'foot', 'yam', 'long', 'beans', 'tindora', 'urad', 'dal', 'split', 'pigeon', 'peas', 'chana', 'dal', 'urad', 'dal', 'green', 'peas', 'french', 'beans', 'chana', 'dal', 'urad', 'dal', 'fresh', 'coconut', 'sesame', 'seeds', 'curry', 'leaves', 'chana', 'dal', 'urad', 'dal', 'whole', 'urad', 'dal', 'blend', 'rice', 'rock', 'salt', 'rice', 'flour', 'hot', 'water', 'grated', 'coconut', 'split', 'urad', 'dal', 'urad', 'dal', 'idli', 'rice', 'thick', 'poha', 'rock', 'salt', 'carrot', 'yellow', 'mustard', 'red', 'chilli', 'black', 'salt', 'sesame', 'oil', 'drumstick', 'tamarind', 'paste', 'sambar', 'powder', 'tomato', 'moong', 'dal', 'chana', 'dal', 'spinach', 'urad', 'dal', 'coconut', 'oil', 'urad', 'dal', 'curry', 'leaves', 'sugar', 'mustard', 'seeds', 'spinach', 'greens', 'tomato', 'mustard', 'seeds', 'fenugreek', 'seeds', 'amaranth', 'leaves', 'split', 'urad', 'dal', 'mustard', 'seeds', 'grated', 'coconut', 'red', 'chili', 'beef', 'coconut', 'garam', 'masala', 'curry', 'leaves', 'green', 'chilies', 'chili', 'powder', 'chana', 'dal', 'urad', 'dal', 'potato', 'beans', 'peas', 'moong', 'dal', 'chana', 'dal', 'cabbage', 'tamarind', 'curry', 'leaves', 'moong', 'dal', 'cucumber', 'curry', 'leaves', 'green', 'chili', 'lemon', 'juice', 'chana', 'dal', 'urad', 'dal', 'gooseberry', 'raw', 'rice', 'curry', 'leaves', 'sesame', 'oil', 'raw', 'rice', 'jaggery', 'grated', 'coconut', 'pearl', 'onions', 'urad', 'dal', 'drumsticks', 'tomato', 'curry', 'leaves', 'chana', 'dal', 'urad', 'dal', 'potatoes', 'idli', 'rice', 'thick', 'poha', 'coconut', 'oil', 'cucumber', 'curd', 'curry', 'leaves', 'mustard', 'seeds', 'yogurt', 'ginger', 'curry', 'leaves', 'baking', 'soda', 'green', 'chilli', 'lentils', 'black', 'pepper', 'vegetable', 'oil', 'raw', 'rice', 'jaggery', 'milk', 'rice', 'cashew', 'nuts', 'milk', 'raisins', 'sugar', 'arhar', 'dal', 'sambar', 'powder', 'tomato', 'curry', 'leaves', 'fennel', 'seeds', 'green', 'moong', 'beans', 'rice', 'flour', 'chana', 'dal', 'urad', 'dal', 'beans', 'coconut', 'mustard', 'urad', 'dal', 'lemon', 'tamarind', 'cooked', 'rice', 'curry', 'leaves', 'tomato', 'curry', 'leaves', 'garlic', 'mustard', 'seeds', 'hot', 'water', 'brown', 'rice', 'flour', 'sugar', 'grated', 'coconut', 'pigeon', 'peas', 'eggplant', 'drumsticks', 'sambar', 'powder', 'tamarind', 'thin', 'rice', 'flakes', 'black', 'sesame', 'seeds', 'curry', 'leaves', 'sevai', 'parboiled', 'rice', 'steamer', 'urad', 'dal', 'curd', 'sesame', 'oil', 'ginger', 'curry', 'leaves', 'mustard', 'seeds', 'coconut', 'whole', 'red', 'beans', 'masala', 'sesame', 'oil', 'tamarind', 'chana', 'dal', 'urad', 'dal', 'thick', 'poha', 'tomato', 'butter', 'urad', 'dal', 'ginger', 'curry', 'leaves', 'green', 'chilies', 'black', 'pepper', 'meat', 'curry', 'powder', 'chicken', 'chunks', 'ginger', 'tomato', 'cinnamon', 'chana', 'dal', 'urad', 'dal', 'ginger', 'curry', 'leaves', 'sugar', 'kala', 'masala', 'arhar', 'dal', 'curry', 'leaves', 'mustard', 'seeds', 'hot', 'water', 'gram', 'flour', 'mustard', 'garlic', 'turmeric', 'red', 'chilli', 'baingan', 'fish', 'coconut', 'oil', 'fresh', 'coconut', 'ginger', 'urad', 'dal', 'potatoes', 'wheat', 'flour', 'sooji', 'wheat', 'flour', 'pearl', 'millet', 'flour', 'hot', 'water', 'condensed', 'milk', 'mawa', 'desiccated', 'coconut', 'almonds', 'cashews', 'jowar', 'flour', 'sesame', 'seeds', 'bombay', 'duck', 'malvani', 'masala', 'rice', 'flour', 'bombay', 'rava', 'green', 'chilies', 'rice', 'flour', 'sesame', 'plain', 'flour', 'turmeric', 'red', 'chilli', 'citric', 'acid', 'fry', 'raisins', 'sugar', 'chana', 'daal', 'urad', 'dal', 'bengal', 'gram', 'flour', 'dried', 'mango', 'baking', 'soda', 'black', 'salt', 'condensed', 'milk', 'nestle', 'cream', 'coconut', 'ice', 'red', 'food', 'coloring', 'desiccated', 'coconut', 'whole', 'wheat', 'flour', 'dal', 'kokum', 'gur', 'bengal', 'gram', 'flour', 'pav', 'aloo', 'peanut', 'pomegranate', 'star', 'anise', 'urad', 'dal', 'bhuna', 'chana', 'garam', 'masala', 'dates', 'tamarind', 'arhar', 'dal', 'coconut', 'oil', 'curry', 'leaves', 'mustard', 'seeds', 'red', 'chilli', 'rava', 'coconut', 'gram', 'flour', 'mustard', 'sesame', 'bottle', 'gourd', 'green', 'raisins', 'sugar', 'clarified', 'butter', 'yogurt', 'besan', 'sauce', 'garam', 'masala', 'powder', 'gram', 'flour', 'wheat', 'flour', 'jaggery', 'clarified', 'butter', 'sliced', 'almonds', 'dry', 'fruits', 'semolina', 'all', 'purpose', 'flour', 'bottle', 'gourd', 'chana', 'dal', 'cabbage', 'urad', 'dal', 'toor', 'dal', 'whole', 'wheat', 'rava', 'chia', 'seed', 'lemon', 'edible', 'gum', 'litre', 'milk', 'green', 'chilies', 'lemon', 'juice', 'chili', 'powder', 'boiled', 'potatoes', 'wheat', 'flour', 'cashews', 'rapeseed', 'oil', 'mango', 'sugar', 'whole', 'wheat', 'flour', 'low', 'fat', 'bengal', 'gram', 'flour', 'green', 'chili', 'paste', 'white', 'sesame', 'seeds', 'gram', 'flour', 'curry', 'leaves', 'green', 'chili', 'rice', 'flour', 'urad', 'dal', 'wheat', 'flour', 'gram', 'flour', 'turmeric', 'cinnamon', 'jaggery', 'clarified', 'butter', 'dry', 'roasted', 'cucumber', 'carrot', 'tomatoes', 'cilantro', 'rava', 'gram', 'flour', 'lemon', 'juice', 'turmeric', 'fenugreek', 'leaves', 'rose', 'water', 'pistachio', 'badam', 'bengal', 'gram', 'flour', 'saffron', 'bottle', 'gourd', 'whole', 'wheat', 'flour', 'rava', 'sesame', 'seeds', 'bengal', 'gram', 'flour', 'arbi', 'ke', 'patte', 'sesame', 'seeds', 'gur', 'bengal', 'gram', 'flour', 'imli', 'pav', 'bhaji', 'masala', 'gobi', 'potatoes', 'green', 'peas', 'dinner', 'rolls', 'aloo', 'urad', 'dal', 'mustard', 'ginger', 'curry', 'leaves', 'raw', 'peanuts', 'sabudana', 'lemon', 'avocado', 'oil', 'curry', 'leaves', 'green', 'chili', 'khaman', 'pomegranate', 'sev', 'powdered', 'sugar', 'garlic', 'sev', 'ginger', 'tomato', 'sugar', 'wheat', 'flour', 'baking', 'soda', 'all', 'purpose', 'flour', 'black', 'pepper', 'sunflower', 'oil', 'whole', 'wheat', 'flour', 'gur', 'clarified', 'butter', 'rice', 'flakes', 'yogurt', 'raw', 'rice', 'jaggery', 'grated', 'coconut', 'whole', 'wheat', 'flour', 'rice', 'flour', 'pearl', 'millet', 'flour', 'sorghum', 'flour', 'sesame', 'seeds', 'sweet', 'potato', 'surti', 'papdi', 'baby', 'potatoes', 'valor', 'papdi', 'green', 'peas', 'gobi', 'potato', 'beans', 'khus', 'khus', 'coconut', 'chicken', 'coconut', 'oil', 'wine', 'vinegar', 'ginger', 'green', 'cinnamon', 'green', 'garlic', 'chutney', 'fresh', 'green', 'peas', 'ginger', 'lemon', 'juice', 'plain', 'flour', 'moong', 'beans', 'jaggery', 'red', 'chillies', 'oil', 'salt', 'rice', 'flour', 'sesame', 'seeds', 'baking', 'soda', 'peanut', 'oil', 'chickpea', 'flour', 'methi', 'leaves', 'jowar', 'flour', 'wheat', 'flour', 'semolina', 'clarified', 'butter', 'oil', 'white', 'flour', 'black', 'pepper', 'yogurt', 'fresh', 'coconut', 'sesame', 'seeds', 'semolina', 'gram', 'flour', 'ridge', 'gourd', 'baking', 'soda', 'sugar', 'grated', 'coconut', 'peas', 'whole', 'wheat', 'flour', 'khus', 'khus', 'sesame', 'seeds', 'dry', 'coconut', 'gur', 'rice', 'mango', 'curd', 'sticky', 'rice', 'rice', 'flour', 'jaggery', 'orange', 'rind', 'raw', 'papaya', 'panch', 'phoran', 'masala', 'nigella', 'seeds', 'mustard', 'oil', 'fennel', 'seeds', 'rice', 'eggs', 'carrot', 'beetroot', 'maida', 'vegetable', 'oil', 'potatoes', 'mustard', 'oil', 'fish', 'green', 'chillies', 'ridge', 'gourd', 'fish', 'lemon', 'tomatoes', 'mustard', 'oil', 'brinjal', 'onions', 'salt', 'sesame', 'seeds', 'coriander', 'potatoes', 'garam', 'masala', 'tomatoes', 'mustard', 'oil', 'bay', 'leaf', 'forbidden', 'black', 'rice', 'chicken', 'olive', 'oil', 'slivered', 'almonds', 'garlic', 'powder', 'biryani', 'masala', 'mixed', 'vegetables', 'yellow', 'moong', 'daal', 'whole', 'red', 'mustard', 'seeds', 'brown', 'rice', 'soy', 'sauce', 'olive', 'oil', 'coconut', 'milk', 'lobster', 'fresh', 'green', 'chilli', 'ginger', 'red', 'onion', 'baking', 'soda', 'clarified', 'butter', 'oil', 'all', 'purpose', 'flour', 'jaggery', 'raisins', 'lamb', 'garam', 'masala', 'powder', 'curd', 'turmeric', 'bay', 'leaf', 'coconut', 'prawns', 'curd', 'mustard', 'seed', 'green', 'chili', 'fish', 'fillet', 'besan', 'lemon', 'mint', 'ginger', 'fermented', 'bamboo', 'shoot', 'potato', 'ginger', 'green', 'mustard', 'oil', 'banana', 'flower', 'chicken', 'green', 'chili', 'mustard', 'oil', 'lemon', 'juice', 'aloo', 'tomatoes', 'mustard', 'oil', 'bay', 'leaf', 'cinnamon', 'stick', 'rice', 'flour', 'mutton', 'banana', 'gram', 'flour', 'olive', 'oil', 'baking', 'powder', 'fish', 'roe', 'pumpkin', 'flowers', 'mustard', 'oil', 'turmeric', 'tomato', 'chana', 'dal', 'fresh', 'coconut', 'ginger', 'cinnamon', 'raisins', 'curd', 'cooked', 'rice', 'curry', 'leaves', 'dry', 'chilli', 'tea', 'leaves', 'white', 'sesame', 'seeds', 'dry', 'coconut', 'soaked', 'rice', 'basmati', 'rice', 'rose', 'water', 'sugar', 'clarified', 'butter', 'cardamom', 'pods', 'coconut', 'milk', 'prawns', 'garlic', 'turmeric', 'sugar', 'red', 'pepper', 'red', 'onion', 'butter', 'watercress', 'olive', 'oil', 'green', 'beans', 'bitter', 'gourd', 'ridge', 'gourd', 'banana', 'brinjal', 'glutinous', 'rice', 'black', 'sesame', 'seeds', 'gur', 'coconut', 'milk', 'egg', 'yolks', 'clarified', 'butter', 'all', 'purpose', 'flour', 'cottage', 'cheese', 'dry', 'dates', 'dried', 'rose', 'petals', 'pistachio', 'badam', 'milk', 'powder', 'dry', 'fruits', 'arrowroot', 'powder', 'all', 'purpose', 'flour', 'brown', 'rice', 'fennel', 'seeds', 'grated', 'coconut', 'black', 'pepper', 'ginger', 'powder']
In [11]:
word_freq={}
word_freq = collections.Counter(ingredients)
W = WordCloud(background_color="white").fit_words(word_freq)

plt.figure(figsize = (10, 10), facecolor = None) 
plt.imshow(W)
plt.axis('off')
plt.show()

The diagram below shows the ingredients used majorly in the preparation of Indian dishes.

In [12]:
for i in range(0,len(ingredients)):
    text = ' '.join(ingredients)

india_coloring = np.array(Image.open('ind.jpg'))

wc = WordCloud(background_color="white", width = 400, height = 400, mask=india_coloring, min_font_size=8)
wc.generate(text)

image_colors = ImageColorGenerator(india_coloring)

plt.figure(figsize = (20, 20))
plt.imshow(wc.recolor(color_func=image_colors), interpolation="bilinear")
plt.axis('off')
plt.show()
In [13]:
words = np.array(list(word_freq.keys()))
print(words)
['maida' 'flour' 'yogurt' 'oil' 'sugar' 'gram' 'ghee' 'carrots' 'milk'
 'cashews' 'raisins' 'kewra' 'clarified' 'butter' 'almonds' 'pistachio'
 'saffron' 'green' 'cardamom' 'powder' 'plain' 'baking' 'water' 'rose'
 'syrup' 'lentil' 'corn' 'soda' 'vinegar' 'curd' 'turmeric' 'cottage'
 'cheese' 'rice' 'dried' 'fruits' 'nuts' 'refined' 'besan' 'powdered'
 'yoghurt' 'firm' 'white' 'pumpkin' 'kitchen' 'lime' 'alum' 'condensed'
 'spices' 'semolina' 'khoa' 'coconut' 'molu' 'leaf' 'dry' 'chhena'
 'chenna' 'cream' 'lemon' 'juice' 'flakes' 'fried' 'power' 'fennel'
 'seeds' 'jaggery' 'wheat' 'sweetened' 'reduced' 'vegetable' 'elachi'
 'and' 'peanuts' 'dharwadi' 'buffalo' 'loaf' 'bread' 'salt' 'black'
 'lentils' 'mung' 'bean' 'skimmed' 'chickpeas' 'chana' 'dal' 'apricots'
 'vermicelli' 'pudding' 'banana' 'khus-khus' 'cucumber' 'rava' 'fish'
 'potol' 'tomato' 'chillies' 'ginger' 'garlic' 'boiled' 'pork' 'onions'
 'axone' 'cauliflower' 'potato' 'garam' 'masala' 'curry' 'leaves' 'crumbs'
 'peas' 'fenugreek' 'shimla' 'mirch' 'amchur' 'chole' 'ladies' 'finger'
 'kasuri' 'methi' 'tomatoes' 'chili' 'chicken' 'thighs' 'basmati' 'star'
 'anise' 'sweet' 'greek' 'cashew' 'paste' 'red' 'onion' 'avocado' 'whole'
 'olive' 'hot' 'all' 'purpose' 'dahi' 'sesame' 'naan' 'sauce' 'skinless'
 'breasts' 'heavy' 'egg' 'bhatura' 'bay' 'cinnamon' 'stick' 'moong'
 'masoor' 'almond' 'chilli' 'kidney' 'beans' 'urad' 'pigeon' 'baby'
 'potatoes' 'beaten' 'chilies' 'arhar' 'dough' 'bell' 'peppers' 'gravy'
 'bitter' 'gourd' 'paneer' 'falooda' 'sev' 'mixed' 'bottle' 'roasted'
 'nigella' 'palak' 'makki' 'atta' 'mustard' 'chickpea' 'mushroom' 'malai'
 'capsicum' 'canned' 'frozen' 'wild' 'mushrooms' 'honey' 'khus' 'low'
 'fat' 'whipping' 'tandoori' 'pepper' 'kala' 'mashed' 'boondi' 'musk'
 'melon' 'poppy' 'edible' 'gum' 'chopped' 'coriander' 'arbi' 'ke' 'patte'
 'gur' 'bengal' 'imli' 'tea' 'bags' 'sattu' 'filling' 'slices' 'french'
 'carrot' 'marinade' 'biryani' 'fillets' 'jeera' 'raw' 'elephant' 'foot'
 'yam' 'long' 'tindora' 'split' 'fresh' 'blend' 'rock' 'grated' 'idli'
 'thick' 'poha' 'yellow' 'drumstick' 'tamarind' 'sambar' 'spinach'
 'greens' 'amaranth' 'beef' 'cabbage' 'gooseberry' 'pearl' 'drumsticks'
 'cooked' 'brown' 'eggplant' 'thin' 'sevai' 'parboiled' 'steamer' 'meat'
 'chunks' 'baingan' 'sooji' 'millet' 'mawa' 'desiccated' 'jowar' 'bombay'
 'duck' 'malvani' 'citric' 'acid' 'fry' 'daal' 'mango' 'nestle' 'ice'
 'food' 'coloring' 'kokum' 'pav' 'aloo' 'peanut' 'pomegranate' 'bhuna'
 'dates' 'sliced' 'toor' 'chia' 'seed' 'litre' 'rapeseed' 'cilantro'
 'badam' 'bhaji' 'gobi' 'dinner' 'rolls' 'sabudana' 'khaman' 'sunflower'
 'sorghum' 'surti' 'papdi' 'valor' 'wine' 'chutney' 'ridge' 'sticky'
 'orange' 'rind' 'papaya' 'panch' 'phoran' 'eggs' 'beetroot' 'brinjal'
 'forbidden' 'slivered' 'vegetables' 'soy' 'lobster' 'lamb' 'prawns'
 'fillet' 'mint' 'fermented' 'bamboo' 'shoot' 'flower' 'mutton' 'roe'
 'flowers' 'soaked' 'pods' 'watercress' 'glutinous' 'yolks' 'petals'
 'arrowroot']
In [14]:
def create_ingredientsVector(ingredients):
    ingredients_vec = np.zeros(words.shape)
    ingredients = set([word.lower() for word in nltk.word_tokenize(ingredients) if not word in ['.', ',']])
    for ingredient in ingredients:
        idx = np.where(words == ingredient)
        ingredients_vec[idx] = 1
    return ingredients_vec.tolist()
In [15]:
df["ingredients_vec"] = df["ingredients"].map(create_ingredientsVector)
df.head()
Out[15]:
name ingredients diet prep_time cook_time flavor_profile course state region ingredients_vec
0 Balu shahi Maida flour, yogurt, oil, sugar vegetarian 45 25 sweet dessert West Bengal East [1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, ...
1 Boondi Gram flour, ghee, sugar vegetarian 80 30 sweet dessert Rajasthan West [0.0, 1.0, 0.0, 0.0, 1.0, 1.0, 1.0, 0.0, 0.0, ...
2 Gajar ka halwa Carrots, milk, sugar, ghee, cashews, raisins vegetarian 15 60 sweet dessert Punjab North [0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 1.0, 1.0, 1.0, ...
3 Ghevar Flour, ghee, kewra, milk, clarified butter, su... vegetarian 15 30 sweet dessert Rajasthan West [0.0, 1.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 1.0, ...
4 Gulab jamun Milk powder, plain flour, baking powder, ghee,... vegetarian 15 40 sweet dessert West Bengal East [0.0, 1.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 1.0, ...

The ingredients need to be tokenized for further analysis. This is the most important feature of the model where the diet is predicted mainly using the ingredients. The food ingredients are taken and vectors are created for each and every dish. This is similar to label encoding. This is done so that the algorithm can process the contents of the dish in the prediction process. The vectors are of the shape (255, 337) or (no of dishes, no of total ingredients).

In [16]:
ingredients_vecs = []
for i in range(len(df)):
    ingredients_vecs.append(df["ingredients_vec"][i])
    
ingredients_vecs = np.array(ingredients_vecs)
In [17]:
print(ingredients_vecs.shape)
(255, 337)
In [18]:
from sklearn.metrics.pairwise import cosine_similarity
cos_simi_matrix = cosine_similarity(ingredients_vecs, ingredients_vecs)
In [19]:
plt.figure(figsize=(20, 20))
fig = sns.heatmap(cos_simi_matrix, cmap="Spectral")
fig.set_title("Cosine Similarity of Ingredient Vectors")
Out[19]:
Text(0.5, 1.0, 'Cosine Similarity of Ingredient Vectors')

Correlation - The correlation heatmap was used to check the correlation between the features and no highly correlated feature were indicated. All correlations are less than 0.8, indicating low correlation.

In the Heatmap 0 - 66th (these numbers correspond to index of "data frame") ingredients vectors have high cosine similarity each other. Cosine similarity to calculate similarity of ingredient vectors. If cosine similarity between two foods is high, it can be inferred that dishes are similar.

In [20]:
df[df['name'].isin(['Kheer', 'Phirni', 'Rabri'])]
Out[20]:
name ingredients diet prep_time cook_time flavor_profile course state region ingredients_vec
9 Kheer Milk, rice, sugar, dried fruits vegetarian 10 40 sweet dessert -1 -1 [0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, ...
14 Phirni Rice, sugar, nuts vegetarian 30 20 sweet dessert Odisha East [0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, ...
15 Rabri Condensed milk, sugar, spices, nuts vegetarian 10 45 sweet dessert Uttar Pradesh North [0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, ...

Ingredient vectors are used to check the cosine similarity between 2 dishes. Mathematically, it measures the cosine of the angle between two vectors projected in a multi-dimensional space. The cosine similarity is advantageous because even if the two vectors are far apart by the Euclidean distance, chances are they may still be oriented closer together. The smaller the angle, higher the cosine similarity.

In [21]:
cosine_similarity([ingredients_vecs[9]], [ingredients_vecs[14]])
Out[21]:
array([[0.51639778]])
In [22]:
cosine_similarity([ingredients_vecs[14]], [ingredients_vecs[15]])
Out[22]:
array([[0.51639778]])

As seen above the dishes at location 9, 14, and 15 are sweet dishes and have very similar ingredients. Therefore the cosine similarity between the ingredient vectors is high, indicating actual closeness between the dishes.

In [23]:
df.iloc[30]['name']
Out[23]:
'Pantua'
In [24]:
cosine_similarity([ingredients_vecs[9]], [ingredients_vecs[30]])
Out[24]:
array([[0.2236068]])

As seen above the dich at location 30 is a savory dish compared to a sweet dish. Therefore having a small cosine similarity between them, Indicating no actual closeness between the contents of the dishes.

In [25]:
from sklearn.cluster import KMeans
wcss = []
for i in range(1, 10):
    kmeans = KMeans(n_clusters = i, init = 'k-means++', max_iter = 300, n_init = 10,random_state = 100)
    kmeans.fit(ingredients_vecs)
    wcss.append(kmeans.inertia_)

#Plot Elbow Method
plt.plot(range(1, 10), wcss,marker='o')
plt.title('The elbow method')
plt.xlabel('Number of clusters')
plt.ylabel('WCSS') #within cluster sum of squares
plt.show()

It is a supervised learning problem but we would like to see in haow may categories would a dish be categorised given that the data set was not labelled. For This we use K-means algorithm. It is a clustering algorithm which searches for a pre-determined number of clusters within an unlabeled multidimensional dataset.

In [26]:
#Create Silhouette Coefficients
from sklearn.metrics import silhouette_score
for n_cluster in range(2, 10):
    kmeans = KMeans(n_clusters=n_cluster).fit(ingredients_vecs)
    label = kmeans.labels_
    sil_coeff = silhouette_score(ingredients_vecs, label, metric='euclidean')
    print('For n_clusters= {}, The Silhouette Coefficient is {}'.format(n_cluster, sil_coeff))
For n_clusters= 2, The Silhouette Coefficient is 0.09492531747643693
For n_clusters= 3, The Silhouette Coefficient is 0.05158870015669986
For n_clusters= 4, The Silhouette Coefficient is 0.03304555196304871
For n_clusters= 5, The Silhouette Coefficient is 0.04614604315634691
For n_clusters= 6, The Silhouette Coefficient is 0.05201535279374043
For n_clusters= 7, The Silhouette Coefficient is 0.051184780857399435
For n_clusters= 8, The Silhouette Coefficient is 0.048707490867506496
For n_clusters= 9, The Silhouette Coefficient is 0.0390869605063667

Silhouette score is used to evaluate the quality of clusters created using clustering algorithms such as K-Means in terms of how well samples are clustered with other samples that are similar to each other. The Silhouette score is calculated for each sample of different clusters. In the set of Silhouette score where ever there is an abrupt change that is the point of optimal number of clusters.

In [27]:
food_vocab = set()

for ingredients in df['ingredients']:
    for food in ingredients.split(','):
        if food.strip().lower() not in food_vocab:
            food_vocab.add(food.strip().lower())
In [28]:
len(food_vocab)
Out[28]:
365
In [29]:
print(food_vocab)
{'hot water', 'ginger and garlic', 'carrot', 'green chili paste', 'mustard oil', 'marinade', 'sauce', 'bhatura', 'french beans', 'sesame seeds', 'red onion', 'ginger', 'rose syrup', 'bengal gram flour', 'nestle cream', 'khaman', 'rice', 'gravy', 'coconut flakes', 'rapeseed oil', 'wine vinegar', 'cardamom powder', 'soy sauce', 'shimla mirch', 'milk powder', 'arrowroot powder', 'yellow mustard', 'beef', 'aloo', 'toor dal', 'litre milk', 'orange rind', 'dharwadi buffalo milk', 'fermented bamboo shoot', 'mung bean', 'atta', 'dry dates', 'dry fruits', 'coconut', 'pearl onions', 'chia seed', 'sabudana', 'black pepper', 'split pigeon peas', 'khus khus', 'star anise', 'elachi', 'water', 'fish fillet', 'peanuts', 'ginger powder', 'fennel seeds', 'raw banana', 'mushroom', 'egg yolks', 'plain flour', 'urad dal', 'sunflower oil', 'green chili', 'green chilies', 'dried rose petals', 'boondi', 'green', 'sweet potato', 'falooda sev', 'lentils', 'steamer', 'onions', 'condensed milk', 'bell pepper', 'amchur powder', 'fenugreek seeds', 'forbidden black rice', 'watercress', 'kala masala', 'skimmed milk powder', 'pav', 'boiled potatoes', 'whole wheat bread', 'green chillies', 'garam masala', 'nuts', 'cashews', 'brown rice', 'brown rice flour', 'garlic', 'red kidney beans', 'chicken thighs', 'butter', 'fresh green peas', 'mustard seeds', 'rava', 'thin rice flakes', 'jowar flour', 'fennel', 'beetroot', 'tomato paste', 'white urad dal', 'dry coconut', 'rock salt', 'desiccated coconut', 'sweet', 'kala chana', 'eggs', 'malvani masala', 'frozen green peas', 'potol', 'cardamom pods', 'pomegranate', 'loaf bread', 'whole wheat flour', 'nigella seeds', 'flour', 'potato', 'ghee', 'drumsticks', 'fry', 'pumpkin flowers', 'milk', 'lamb', 'cauliflower', 'arbi ke patte', 'raw peanuts', 'firm white pumpkin', 'pork', 'powdered sugar', 'fenugreek leaves', 'sorghum flour', 'black salt', 'garlic powder', 'chillies', 'paneer', 'long beans', 'cream', 'khus-khus seeds', 'tamarind paste', 'mango', 'soaked rice', 'white flour', 'greek yogurt', 'glutinous rice', 'coriander', 'white sesame seeds', 'arhar dal', 'peanut oil', 'axone', 'bombay rava', 'red chili', 'bottle gourd', 'vegetable oil', 'dry roasted', 'dough', 'heavy cream', 'cottage cheese', 'rice flour', 'chana dal', 'maida flour', 'chenna', 'gram flour', 'mustard', 'edible gum', 'kitchen lime', 'cooked rice', 'kokum', 'chana daal', 'baking powder', 'meat curry powder', 'bitter gourd', 'chhena', 'vinegar', 'tea leaves', 'raisins', 'chickpeas', 'yogurt', 'pav bhaji masala', 'black lentils', 'white bread slices', 'cilantro', 'masala', 'jaggery syrup', 'dried fruits', 'carrots', 'brinjal', 'naan bread', 'sticky rice', 'biryani masala powder', 'raw rice', 'lobster', 'fresh green chilli', 'red food coloring', 'sattu', 'whole red', 'masoor dal', 'tamarind', 'kala jeera', 'bread crumbs', 'banana flower', 'avocado oil', 'chicken chunks', 'raw papaya', 'red chilli', 'sev', 'badam', 'cardamom', 'fish', 'honey', 'lemon', 'turmeric', 'potatoes', 'palak', 'yoghurt', 'boiled pork', 'yellow moong daal', 'gur', 'ladies finger', 'parboiled rice', 'green peas', 'blend rice', 'mixed vegetables', 'olive oil', 'surti papdi', 'whole egg', 'chili powder', 'greens', 'bombay duck', 'tomatoes', 'sevai', 'filling', 'coconut oil', 'whipping cream', 'peas', 'coconut ice', 'kasuri methi', 'musk melon seeds', 'eggplant', 'cabbage', 'salt', 'apricots', 'dry chilli', 'canned coconut milk', 'basmati rice', 'oil', 'beaten rice flakes', 'banana', 'besan', 'dried mango', 'drumstick', 'roasted gram flour', 'whole urad dal', 'baking soda', 'sooji', 'sesame', 'wheat flour', 'capsicum', 'almonds', 'vermicelli pudding', 'fresh coconut', 'poppy seeds', 'green moong beans', 'curry leaves', 'dal', 'bhuna chana', 'whole wheat rava', 'sweetened milk', 'alum powder', 'tea bags', 'jaggery', 'tandoori masala', 'fish fillets', 'panch phoran masala', 'mawa', 'dates', 'dinner rolls', 'valor papdi', 'slivered almonds', 'tomato', 'fried milk power', 'gobi', 'whole red beans', 'sesame oil', 'mutton', 'black sesame seeds', 'green chilli', 'chenna cheese', 'moong beans', 'lemon juice', 'chicken', 'methi leaves', 'low fat', 'tomato sauce', 'maida', 'bay leaf', 'corn flour', 'spices', 'green beans', 'citric acid', 'mint', 'baby corn', 'pistachio', 'curd', 'mustard green', 'rice flakes', 'semolina', 'gooseberry', 'malai', 'cashew nuts', 'red chillies', 'moong dal', 'baby potatoes', 'chilli', 'prawns', 'cashews and raisins', 'spinach', 'refined flour', 'green bell pepper', 'skinless chicken breasts', 'kewra', 'chickpea flour', 'dahi', 'ridge gourd', 'coconut milk', 'cinnamon stick', 'lentil flour', 'idli rice', 'wild mushrooms', 'green garlic chutney', 'molu leaf', 'mustard seed', 'mixed nuts', 'imli', 'sugar syrup', 'green cardamom', 'elephant foot yam', 'baingan', 'sliced almonds', 'amaranth leaves', 'saffron', 'garam masala powder', 'reduced milk', 'pigeon peas', 'cinnamon', 'makki atta', 'grated coconut', 'all purpose flour', 'tindora', 'chopped tomatoes', 'sambar powder', 'thick poha', 'red pepper', 'biryani masala', 'split urad dal', 'clarified butter', 'pearl millet flour', 'besan flour', 'beans', 'khoa', 'mashed potato', 'fish roe', 'almond', 'chole', 'peanut', 'rose water', 'cucumber', 'sugar', 'bell peppers'}

The ingredients of in the food are extracted and a vector dataframe is created using the ingredients. Each row is a vector that represents the ingredients in the dish.

In [30]:
food_columns = pd.DataFrame()

for i, ingredients in enumerate(df['ingredients']):
    for food in ingredients.split(','):
        if food.strip().lower() in food_vocab:
            food_columns.loc[i, food.strip().lower()] = 1

food_columns = food_columns.fillna(0)
In [31]:
food_columns
Out[31]:
maida flour yogurt oil sugar gram flour ghee carrots milk cashews raisins ... soaked rice cardamom pods red pepper watercress glutinous rice egg yolks dry dates dried rose petals arrowroot powder ginger powder
0 1.0 1.0 1.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
1 0.0 0.0 0.0 1.0 1.0 1.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
2 0.0 0.0 0.0 1.0 0.0 1.0 1.0 1.0 1.0 1.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
3 0.0 0.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
4 0.0 0.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
250 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0
251 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0
252 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 1.0 1.0 0.0 0.0
253 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0
254 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0

255 rows × 365 columns

In [32]:
data = pd.read_csv('indian_food.csv')
data = data.drop(['name', 'ingredients'], axis=1)

The column name is removed because it provdes no information to the model. Ingredients feature is removed and instead of the ingedients the vector equivalent of the ingredients would be used.

In [33]:
{column: list(data[column].unique()) for column in data.columns if data.dtypes[column] == 'object'}
Out[33]:
{'diet': ['vegetarian', 'non vegetarian'],
 'flavor_profile': ['sweet', 'spicy', 'bitter', '-1', 'sour'],
 'course': ['dessert', 'main course', 'starter', 'snack'],
 'state': ['West Bengal',
  'Rajasthan',
  'Punjab',
  'Uttar Pradesh',
  '-1',
  'Odisha',
  'Maharashtra',
  'Uttarakhand',
  'Assam',
  'Bihar',
  'Andhra Pradesh',
  'Karnataka',
  'Telangana',
  'Kerala',
  'Tamil Nadu',
  'Gujarat',
  'Tripura',
  'Manipur',
  'Nagaland',
  'NCT of Delhi',
  'Jammu & Kashmir',
  'Chhattisgarh',
  'Haryana',
  'Madhya Pradesh',
  'Goa'],
 'region': ['East',
  'West',
  'North',
  '-1',
  'North East',
  'South',
  'Central',
  nan]}

Unique values are confirmed in all the categorical variables in the dataset. It can be seen that there are "-1" values in the dataset the needs to be removed.

In [34]:
data[['flavor_profile', 'state', 'region']] = data[['flavor_profile', 'state', 'region']].replace('-1', np.NaN)

The "-1" values in the dataset are replaced with NaN. This would be later replaced with the mean of the corresponding feature for the continious numerical features.

In [35]:
def onehot_encode(df, columns, prefixes):
    df = df.copy()
    for column, prefix in zip(columns, prefixes):
        dummies = pd.get_dummies(df[column], prefix=prefix)
        df = pd.concat([df, dummies], axis=1)
        df = df.drop(column, axis=1)
    return df
In [36]:
data = onehot_encode(
    data,
    ['flavor_profile', 'course', 'state', 'region'],
    ['f', 'c', 's', 'r']
)

'flavor_profile', 'course', 'state', 'region' are one hot encoded for model building.

In [37]:
data
Out[37]:
diet prep_time cook_time f_bitter f_sour f_spicy f_sweet c_dessert c_main course c_snack ... s_Tripura s_Uttar Pradesh s_Uttarakhand s_West Bengal r_Central r_East r_North r_North East r_South r_West
0 vegetarian 45 25 0 0 0 1 1 0 0 ... 0 0 0 1 0 1 0 0 0 0
1 vegetarian 80 30 0 0 0 1 1 0 0 ... 0 0 0 0 0 0 0 0 0 1
2 vegetarian 15 60 0 0 0 1 1 0 0 ... 0 0 0 0 0 0 1 0 0 0
3 vegetarian 15 30 0 0 0 1 1 0 0 ... 0 0 0 0 0 0 0 0 0 1
4 vegetarian 15 40 0 0 0 1 1 0 0 ... 0 0 0 1 0 1 0 0 0 0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
250 vegetarian 5 30 0 0 0 1 1 0 0 ... 0 0 0 0 0 0 0 1 0 0
251 vegetarian 20 60 0 0 0 1 1 0 0 ... 0 0 0 0 0 0 0 0 0 1
252 vegetarian -1 -1 0 0 0 1 1 0 0 ... 0 0 0 0 0 0 1 0 0 0
253 vegetarian 20 45 0 0 0 1 1 0 0 ... 0 0 0 0 1 0 0 0 0 0
254 vegetarian -1 -1 0 0 0 1 1 0 0 ... 0 0 0 0 0 0 0 0 0 1

255 rows × 41 columns

In [38]:
data[['prep_time', 'cook_time']] = data[['prep_time', 'cook_time']].replace(-1, np.NaN)
In [39]:
data['prep_time'] = data['prep_time'].fillna(data['prep_time'].mean())
data['cook_time'] = data['cook_time'].fillna(data['cook_time'].mean())
In [40]:
label_encoder = LabelEncoder()
data['diet'] = label_encoder.fit_transform(data['diet'])
In [41]:
{index: label for index, label in enumerate(label_encoder.classes_)}
Out[41]:
{0: 'non vegetarian', 1: 'vegetarian'}

Model Building & Evaluation

Neural Network using Keras Sequential Model

In [42]:
y = data['diet']
X = data.drop('diet', axis=1)
X_food = pd.concat([X, food_columns], axis=1)
In [43]:
food_columns.shape
Out[43]:
(255, 365)
In [44]:
sc = StandardScaler()

X = sc.fit_transform(X)
X_food = sc.fit_transform(X_food)

Standard Scaler is used to standardize and normalize the values in the dataset.

In [45]:
X_food.shape
Out[45]:
(255, 405)
In [46]:
from imblearn.over_sampling import SMOTE
smt=SMOTE(random_state=100)

X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.7, random_state=42)
X_train_smt,y_train_smt = smt.fit_resample(X_train,y_train)

X_food_train, X_food_test, y_food_train, y_food_test = train_test_split(X_food, y, train_size=0.7, random_state=42)
X_food_train_smt,y_food_train_smt = smt.fit_resample(X_food_train,y_food_train)

Train Test split is used to create 2 sets to data. First without ingredient vector dataframe attached and sencond with the ingredient vector dataframe appended to the other normalized and encoded features. SMOTE analysis is used to comensate the under-represented class's data points. After SMOTE we have 156 data points for both classes.

In [47]:
print('Train Data - Class Split - Without Ingredient Vectors')
Outcome_0= (y_train_smt == 0).sum()
Outcome_1 = (y_train_smt == 1).sum()
print('Class 0 (Non Vegetarian)-',  Outcome_0)
print('Class 1 (Vegetarian)-',  Outcome_1)
print('\n')
print('Train Data - Class Split - With Ingredient Vectors')
Outcome_0= (y_food_train_smt == 0).sum()
Outcome_1 = (y_food_train_smt == 1).sum()
print('Class 0 (Non Vegetarian)-',  Outcome_0)
print('Class 1 (Vegetarian)-',  Outcome_1)
Train Data - Class Split - Without Ingredient Vectors
Class 0 (Non Vegetarian)- 156
Class 1 (Vegetarian)- 156


Train Data - Class Split - With Ingredient Vectors
Class 0 (Non Vegetarian)- 156
Class 1 (Vegetarian)- 156

Importing Model Libraries

In [49]:
import tensorflow
import keras
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, InputLayer 
import tensorflow.keras.metrics
Using TensorFlow backend.

Neural Network

Neural networks are a set of algorithms, modeled loosely after the human brain, that are designed to recognize patterns. Neural networks help us cluster and classify.In a neural network a single perceptron (neuron) can be imagined as a Logistic Regression. Artificial Neural Network, or ANN, is a group of multiple perceptrons/ neurons at each layer. ANN is also known as a Feed-Forward Neural network because inputs are processed only in the forward direction.

It consists of 3 layers – Input, Hidden and Output. The input layer accepts the inputs, the hidden layer processes the inputs, and the output layer produces the result. Essentially, each layer tries to learn certain weights.

In [50]:
def build_model(num_features, hidden_layer_sizes=(64, 64)):

    model = Sequential()
    model.add(InputLayer(input_shape=(num_features, )))
    model.add(Dense(hidden_layer_sizes[0], activation='relu'))
    model.add(Dense(hidden_layer_sizes[1], activation='relu'))
    model.add(Dense(1, activation='sigmoid'))
    model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy', tensorflow.keras.metrics.AUC(name='auc')])
    
    model.summary()
    
    return model

The First model is an Neiral Network which is created with the dataset without the ingredient vectors appended. It has has 4 layers Input, Output and 2 hidden layers. The input layers has the size equivalent to the number of features in the dataset. In this case 40. The batch size is kept 64 and the the number of epochs is defined as 41.

In [51]:
X.shape
Out[51]:
(255, 40)
In [52]:
model = build_model(40)

batch_size = 64
epochs = 41

history = model.fit(
    X_train_smt,
    y_train_smt,
    validation_split=0.2,
    batch_size=batch_size,
    epochs=epochs
)
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense (Dense)                (None, 64)                2624      
_________________________________________________________________
dense_1 (Dense)              (None, 64)                4160      
_________________________________________________________________
dense_2 (Dense)              (None, 1)                 65        
=================================================================
Total params: 6,849
Trainable params: 6,849
Non-trainable params: 0
_________________________________________________________________
Train on 249 samples, validate on 63 samples
Epoch 1/41
249/249 [==============================] - 2s 8ms/sample - loss: 0.6327 - accuracy: 0.6586 - auc: 0.8682 - val_loss: 0.4757 - val_accuracy: 0.9841 - val_auc: 0.0000e+00
Epoch 2/41
249/249 [==============================] - 0s 189us/sample - loss: 0.5209 - accuracy: 0.8394 - auc: 0.9478 - val_loss: 0.4829 - val_accuracy: 0.9683 - val_auc: 0.0000e+00
Epoch 3/41
249/249 [==============================] - 0s 130us/sample - loss: 0.4471 - accuracy: 0.9036 - auc: 0.9697 - val_loss: 0.4689 - val_accuracy: 0.9048 - val_auc: 0.0000e+00
Epoch 4/41
249/249 [==============================] - 0s 110us/sample - loss: 0.3901 - accuracy: 0.9438 - auc: 0.9770 - val_loss: 0.4259 - val_accuracy: 0.8889 - val_auc: 0.0000e+00
Epoch 5/41
249/249 [==============================] - 0s 154us/sample - loss: 0.3414 - accuracy: 0.9518 - auc: 0.9817 - val_loss: 0.3809 - val_accuracy: 0.8571 - val_auc: 0.0000e+00
Epoch 6/41
249/249 [==============================] - 0s 154us/sample - loss: 0.3014 - accuracy: 0.9518 - auc: 0.9809 - val_loss: 0.3396 - val_accuracy: 0.8571 - val_auc: 0.0000e+00
Epoch 7/41
249/249 [==============================] - 0s 102us/sample - loss: 0.2683 - accuracy: 0.9558 - auc: 0.9833 - val_loss: 0.3002 - val_accuracy: 0.8413 - val_auc: 0.0000e+00
Epoch 8/41
249/249 [==============================] - 0s 122us/sample - loss: 0.2403 - accuracy: 0.9558 - auc: 0.9838 - val_loss: 0.2671 - val_accuracy: 0.8571 - val_auc: 0.0000e+00
Epoch 9/41
249/249 [==============================] - 0s 100us/sample - loss: 0.2160 - accuracy: 0.9639 - auc: 0.9854 - val_loss: 0.2471 - val_accuracy: 0.8571 - val_auc: 0.0000e+00
Epoch 10/41
249/249 [==============================] - 0s 125us/sample - loss: 0.1962 - accuracy: 0.9598 - auc: 0.9866 - val_loss: 0.2336 - val_accuracy: 0.8571 - val_auc: 0.0000e+00
Epoch 11/41
249/249 [==============================] - 0s 88us/sample - loss: 0.1798 - accuracy: 0.9639 - auc: 0.9868 - val_loss: 0.2226 - val_accuracy: 0.8730 - val_auc: 0.0000e+00
Epoch 12/41
249/249 [==============================] - 0s 127us/sample - loss: 0.1649 - accuracy: 0.9639 - auc: 0.9888 - val_loss: 0.2175 - val_accuracy: 0.8730 - val_auc: 0.0000e+00
Epoch 13/41
249/249 [==============================] - 0s 105us/sample - loss: 0.1533 - accuracy: 0.9639 - auc: 0.9880 - val_loss: 0.2156 - val_accuracy: 0.8730 - val_auc: 0.0000e+00
Epoch 14/41
249/249 [==============================] - 0s 87us/sample - loss: 0.1443 - accuracy: 0.9639 - auc: 0.9884 - val_loss: 0.2047 - val_accuracy: 0.8730 - val_auc: 0.0000e+00
Epoch 15/41
249/249 [==============================] - 0s 115us/sample - loss: 0.1361 - accuracy: 0.9639 - auc: 0.9881 - val_loss: 0.1930 - val_accuracy: 0.8730 - val_auc: 0.0000e+00
Epoch 16/41
249/249 [==============================] - 0s 136us/sample - loss: 0.1294 - accuracy: 0.9719 - auc: 0.9918 - val_loss: 0.1834 - val_accuracy: 0.8730 - val_auc: 0.0000e+00
Epoch 17/41
249/249 [==============================] - 0s 107us/sample - loss: 0.1235 - accuracy: 0.9719 - auc: 0.9881 - val_loss: 0.1840 - val_accuracy: 0.8730 - val_auc: 0.0000e+00
Epoch 18/41
249/249 [==============================] - 0s 104us/sample - loss: 0.1190 - accuracy: 0.9719 - auc: 0.9888 - val_loss: 0.1814 - val_accuracy: 0.8730 - val_auc: 0.0000e+00
Epoch 19/41
249/249 [==============================] - 0s 107us/sample - loss: 0.1155 - accuracy: 0.9719 - auc: 0.9902 - val_loss: 0.1713 - val_accuracy: 0.8730 - val_auc: 0.0000e+00
Epoch 20/41
249/249 [==============================] - 0s 116us/sample - loss: 0.1114 - accuracy: 0.9759 - auc: 0.9908 - val_loss: 0.1718 - val_accuracy: 0.8730 - val_auc: 0.0000e+00
Epoch 21/41
249/249 [==============================] - 0s 125us/sample - loss: 0.1089 - accuracy: 0.9759 - auc: 0.9886 - val_loss: 0.1716 - val_accuracy: 0.8730 - val_auc: 0.0000e+00
Epoch 22/41
249/249 [==============================] - 0s 114us/sample - loss: 0.1063 - accuracy: 0.9759 - auc: 0.9905 - val_loss: 0.1650 - val_accuracy: 0.8730 - val_auc: 0.0000e+00
Epoch 23/41
249/249 [==============================] - 0s 106us/sample - loss: 0.1037 - accuracy: 0.9759 - auc: 0.9915 - val_loss: 0.1634 - val_accuracy: 0.8730 - val_auc: 0.0000e+00
Epoch 24/41
249/249 [==============================] - 0s 100us/sample - loss: 0.1021 - accuracy: 0.9759 - auc: 0.9918 - val_loss: 0.1640 - val_accuracy: 0.8730 - val_auc: 0.0000e+00
Epoch 25/41
249/249 [==============================] - 0s 134us/sample - loss: 0.1002 - accuracy: 0.9759 - auc: 0.9920 - val_loss: 0.1597 - val_accuracy: 0.9048 - val_auc: 0.0000e+00
Epoch 26/41
249/249 [==============================] - 0s 150us/sample - loss: 0.0986 - accuracy: 0.9759 - auc: 0.9927 - val_loss: 0.1540 - val_accuracy: 0.9206 - val_auc: 0.0000e+00
Epoch 27/41
249/249 [==============================] - 0s 141us/sample - loss: 0.0969 - accuracy: 0.9759 - auc: 0.9933 - val_loss: 0.1537 - val_accuracy: 0.9206 - val_auc: 0.0000e+00
Epoch 28/41
249/249 [==============================] - 0s 88us/sample - loss: 0.0958 - accuracy: 0.9759 - auc: 0.9930 - val_loss: 0.1506 - val_accuracy: 0.9365 - val_auc: 0.0000e+00
Epoch 29/41
249/249 [==============================] - 0s 130us/sample - loss: 0.0947 - accuracy: 0.9759 - auc: 0.9931 - val_loss: 0.1584 - val_accuracy: 0.9206 - val_auc: 0.0000e+00
Epoch 30/41
249/249 [==============================] - 0s 93us/sample - loss: 0.0933 - accuracy: 0.9759 - auc: 0.9935 - val_loss: 0.1539 - val_accuracy: 0.9206 - val_auc: 0.0000e+00
Epoch 31/41
249/249 [==============================] - 0s 96us/sample - loss: 0.0924 - accuracy: 0.9759 - auc: 0.9937 - val_loss: 0.1495 - val_accuracy: 0.9365 - val_auc: 0.0000e+00
Epoch 32/41
249/249 [==============================] - 0s 137us/sample - loss: 0.0918 - accuracy: 0.9759 - auc: 0.9941 - val_loss: 0.1444 - val_accuracy: 0.9365 - val_auc: 0.0000e+00
Epoch 33/41
249/249 [==============================] - 0s 154us/sample - loss: 0.0908 - accuracy: 0.9759 - auc: 0.9940 - val_loss: 0.1533 - val_accuracy: 0.9365 - val_auc: 0.0000e+00
Epoch 34/41
249/249 [==============================] - 0s 127us/sample - loss: 0.0897 - accuracy: 0.9759 - auc: 0.9939 - val_loss: 0.1547 - val_accuracy: 0.9365 - val_auc: 0.0000e+00
Epoch 35/41
249/249 [==============================] - 0s 153us/sample - loss: 0.0888 - accuracy: 0.9759 - auc: 0.9933 - val_loss: 0.1522 - val_accuracy: 0.9365 - val_auc: 0.0000e+00
Epoch 36/41
249/249 [==============================] - 0s 151us/sample - loss: 0.0880 - accuracy: 0.9759 - auc: 0.9940 - val_loss: 0.1410 - val_accuracy: 0.9365 - val_auc: 0.0000e+00
Epoch 37/41
249/249 [==============================] - 0s 140us/sample - loss: 0.0873 - accuracy: 0.9759 - auc: 0.9942 - val_loss: 0.1379 - val_accuracy: 0.9365 - val_auc: 0.0000e+00
Epoch 38/41
249/249 [==============================] - 0s 146us/sample - loss: 0.0866 - accuracy: 0.9759 - auc: 0.9945 - val_loss: 0.1361 - val_accuracy: 0.9365 - val_auc: 0.0000e+00
Epoch 39/41
249/249 [==============================] - 0s 121us/sample - loss: 0.0861 - accuracy: 0.9759 - auc: 0.9943 - val_loss: 0.1425 - val_accuracy: 0.9365 - val_auc: 0.0000e+00
Epoch 40/41
249/249 [==============================] - 0s 124us/sample - loss: 0.0849 - accuracy: 0.9759 - auc: 0.9943 - val_loss: 0.1414 - val_accuracy: 0.9365 - val_auc: 0.0000e+00
Epoch 41/41
249/249 [==============================] - 0s 144us/sample - loss: 0.0846 - accuracy: 0.9759 - auc: 0.9949 - val_loss: 0.1343 - val_accuracy: 0.9365 - val_auc: 0.0000e+00
In [53]:
plt.figure(figsize=(20, 10))

epochs_range = range(1, epochs + 1)
train_loss, val_loss = history.history['loss'], history.history['val_loss']
train_auc, val_auc = history.history['auc'], history.history['val_auc']

plt.subplot(1, 2, 1)
plt.plot(epochs_range, train_loss, label="Training Loss")
plt.plot(epochs_range, val_loss, label="Validation Loss")
plt.title("Loss")
plt.legend()

plt.subplot(1, 2, 2)
plt.plot(epochs_range, train_auc, label="Training AUC")
plt.plot(epochs_range, val_auc, label="Validation AUC")
plt.title("AUC")
plt.legend()

plt.show()

The performance of the model is plotted. As it can be seen

In the first plot the validation loss decreases with the training loss. At the end of the graph there are a few fluctuations. This is the optimum amount upto which the validation loss can be reduced. As it can be seen ahead that the position at which the loss is minimum is 40. So it cannot be reduced further.

The second graph shows the training AC vs the validation AUC. The training AUC increases initially and then becomes constant. But the validation AUC is constant which show the TPR-FPR (True Positive Rate - False Positive Rate) is constant. In addition the the point at which the value of AUC is maximum is 0.

In [54]:
print(np.argmin(val_loss), np.argmax(val_auc))
40 0
In [55]:
model.evaluate(X_test, y_test)
77/77 [==============================] - 0s 181us/sample - loss: 0.6198 - accuracy: 0.8571 - auc: 0.5490
Out[55]:
[0.6197699362581427, 0.85714287, 0.5489796]

The average loss, accuracy and AUC are 0.486, 0.857, 0.682 respectively. The model has a good accuracy but a decent AUC.

In [56]:
from sklearn.metrics import classification_report, confusion_matrix

# predict probabilities for test set
yhat_probs = model.predict(X_test, verbose=0)
# predict crisp classes for test set
yhat_classes = model.predict_classes(X_test, verbose=0)
# confusion matrix
print('\nConfusion Matrix and Classification Report\n')
matrix = confusion_matrix(y_test, yhat_classes)
print(matrix)
target_names=['Class 0 - non-vegetarian','Class 1 - vegetarian']
print(classification_report(y_test, yhat_classes, target_names=target_names))
Confusion Matrix and Classification Report

[[ 2  5]
 [ 6 64]]
                          precision    recall  f1-score   support

Class 0 - non-vegetarian       0.25      0.29      0.27         7
    Class 1 - vegetarian       0.93      0.91      0.92        70

                accuracy                           0.86        77
               macro avg       0.59      0.60      0.59        77
            weighted avg       0.87      0.86      0.86        77

Model Evaluation

For Neural Network model evaluation Without the ingredient vectors The Confusion Matrix and Classification Report for the Standard Model gives the following results as seen:

• For Class 0 (Non-Vegetarian), 2 identified correctly 6 identified incorrectly.
• For Class 1 (Vegetarian), 64 identified correctly 5 identified incorrectly.
• True Positives – Number of correctly predicted positive values is 2.
• True Negatives - Number of correctly predicted negative values is 64.
• False Positives – Number of negative values incorrectly predicted as positive values is 6.
• False Negatives – Number of positive values incorrectly predicted as negative is 5.

The model has overall Precision of 87% and overall Accuracy of 86%
The Precision for Class 0 / Non-Vegetarian is 25% due to large number of False positives.

In [57]:
X_food.shape
Out[57]:
(255, 405)

The Second model is a Neural Network which is created with the dataset with the ingredient vectors appended. It has has 4 layers Input, Output and 2 hidden layers. The input layers has the size equivalent to the number of features in the dataset which includes the food vectors dataframe appended to the transformed dataframe. In this case 405. The batch size is kept 64 and the the number of epochs is defined as 200.

In [58]:
import tensorflow.compat.v1 as tf
tf.disable_v2_behavior()

food_model = build_model(X_food.shape[1], hidden_layer_sizes=(128, 128))

food_batch_size = 64
food_epochs = 200

food_history = food_model.fit(
    X_food_train_smt,
    y_food_train_smt,
    validation_split=0.2,
    batch_size=food_batch_size,
    epochs=food_epochs
)
WARNING:tensorflow:From C:\Users\rosha\AppData\Roaming\Python\Python37\site-packages\tensorflow_core\python\compat\v2_compat.py:88: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version.
Instructions for updating:
non-resource variables are not supported in the long term
WARNING:tensorflow:From C:\Users\rosha\AppData\Roaming\Python\Python37\site-packages\tensorflow_core\python\ops\resource_variable_ops.py:1635: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense_3 (Dense)              (None, 128)               51968     
_________________________________________________________________
dense_4 (Dense)              (None, 128)               16512     
_________________________________________________________________
dense_5 (Dense)              (None, 1)                 129       
=================================================================
Total params: 68,609
Trainable params: 68,609
Non-trainable params: 0
_________________________________________________________________
Train on 249 samples, validate on 63 samples
Epoch 1/200
249/249 [==============================] - 0s 1ms/sample - loss: 0.7470 - acc: 0.5502 - auc: 0.5601 - val_loss: 0.6891 - val_acc: 0.5873 - val_auc: 0.0000e+00
Epoch 2/200
249/249 [==============================] - 0s 90us/sample - loss: 0.4116 - acc: 0.8635 - auc: 0.9515 - val_loss: 0.4286 - val_acc: 0.8571 - val_auc: 0.0000e+00
Epoch 3/200
249/249 [==============================] - 0s 112us/sample - loss: 0.2418 - acc: 0.9719 - auc: 0.9996 - val_loss: 0.2073 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 4/200
249/249 [==============================] - 0s 141us/sample - loss: 0.1445 - acc: 0.9960 - auc: 0.9999 - val_loss: 0.0937 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 5/200
249/249 [==============================] - 0s 108us/sample - loss: 0.0864 - acc: 1.0000 - auc: 1.0000 - val_loss: 0.0454 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 6/200
249/249 [==============================] - 0s 126us/sample - loss: 0.0525 - acc: 1.0000 - auc: 1.0000 - val_loss: 0.0264 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 7/200
249/249 [==============================] - 0s 105us/sample - loss: 0.0317 - acc: 1.0000 - auc: 1.0000 - val_loss: 0.0168 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 8/200
249/249 [==============================] - 0s 115us/sample - loss: 0.0193 - acc: 1.0000 - auc: 1.0000 - val_loss: 0.0115 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 9/200
249/249 [==============================] - 0s 123us/sample - loss: 0.0126 - acc: 1.0000 - auc: 1.0000 - val_loss: 0.0083 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 10/200
249/249 [==============================] - 0s 112us/sample - loss: 0.0085 - acc: 1.0000 - auc: 1.0000 - val_loss: 0.0063 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 11/200
249/249 [==============================] - 0s 80us/sample - loss: 0.0063 - acc: 1.0000 - auc: 1.0000 - val_loss: 0.0049 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 12/200
249/249 [==============================] - 0s 77us/sample - loss: 0.0047 - acc: 1.0000 - auc: 1.0000 - val_loss: 0.0040 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 13/200
249/249 [==============================] - 0s 79us/sample - loss: 0.0038 - acc: 1.0000 - auc: 1.0000 - val_loss: 0.0033 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 14/200
249/249 [==============================] - 0s 91us/sample - loss: 0.0030 - acc: 1.0000 - auc: 1.0000 - val_loss: 0.0027 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 15/200
249/249 [==============================] - 0s 89us/sample - loss: 0.0026 - acc: 1.0000 - auc: 1.0000 - val_loss: 0.0023 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 16/200
249/249 [==============================] - 0s 104us/sample - loss: 0.0022 - acc: 1.0000 - auc: 1.0000 - val_loss: 0.0020 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 17/200
249/249 [==============================] - 0s 108us/sample - loss: 0.0019 - acc: 1.0000 - auc: 1.0000 - val_loss: 0.0018 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 18/200
249/249 [==============================] - 0s 149us/sample - loss: 0.0017 - acc: 1.0000 - auc: 1.0000 - val_loss: 0.0016 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 19/200
249/249 [==============================] - 0s 109us/sample - loss: 0.0016 - acc: 1.0000 - auc: 1.0000 - val_loss: 0.0014 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 20/200
249/249 [==============================] - 0s 118us/sample - loss: 0.0014 - acc: 1.0000 - auc: 1.0000 - val_loss: 0.0013 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 21/200
249/249 [==============================] - 0s 107us/sample - loss: 0.0013 - acc: 1.0000 - auc: 1.0000 - val_loss: 0.0012 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 22/200
249/249 [==============================] - 0s 77us/sample - loss: 0.0012 - acc: 1.0000 - auc: 1.0000 - val_loss: 0.0011 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 23/200
249/249 [==============================] - 0s 121us/sample - loss: 0.0011 - acc: 1.0000 - auc: 1.0000 - val_loss: 0.0010 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 24/200
249/249 [==============================] - 0s 149us/sample - loss: 0.0011 - acc: 1.0000 - auc: 1.0000 - val_loss: 9.6782e-04 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 25/200
249/249 [==============================] - 0s 180us/sample - loss: 9.9810e-04 - acc: 1.0000 - auc: 1.0000 - val_loss: 9.0916e-04 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 26/200
249/249 [==============================] - 0s 125us/sample - loss: 9.3618e-04 - acc: 1.0000 - auc: 1.0000 - val_loss: 8.5600e-04 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 27/200
249/249 [==============================] - 0s 64us/sample - loss: 8.8166e-04 - acc: 1.0000 - auc: 1.0000 - val_loss: 8.0761e-04 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 28/200
249/249 [==============================] - 0s 113us/sample - loss: 8.3370e-04 - acc: 1.0000 - auc: 1.0000 - val_loss: 7.6500e-04 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 29/200
249/249 [==============================] - 0s 141us/sample - loss: 7.8754e-04 - acc: 1.0000 - auc: 1.0000 - val_loss: 7.2590e-04 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 30/200
249/249 [==============================] - 0s 139us/sample - loss: 7.4633e-04 - acc: 1.0000 - auc: 1.0000 - val_loss: 6.9097e-04 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 31/200
249/249 [==============================] - 0s 128us/sample - loss: 7.0586e-04 - acc: 1.0000 - auc: 1.0000 - val_loss: 6.5810e-04 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 32/200
249/249 [==============================] - 0s 157us/sample - loss: 6.7027e-04 - acc: 1.0000 - auc: 1.0000 - val_loss: 6.2617e-04 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 33/200
249/249 [==============================] - ETA: 0s - loss: 8.4702e-04 - acc: 1.0000 - auc: 1.000 - 0s 129us/sample - loss: 6.3806e-04 - acc: 1.0000 - auc: 1.0000 - val_loss: 5.9728e-04 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 34/200
249/249 [==============================] - 0s 163us/sample - loss: 6.0678e-04 - acc: 1.0000 - auc: 1.0000 - val_loss: 5.7026e-04 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 35/200
249/249 [==============================] - 0s 161us/sample - loss: 5.7810e-04 - acc: 1.0000 - auc: 1.0000 - val_loss: 5.4382e-04 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 36/200
249/249 [==============================] - 0s 156us/sample - loss: 5.5124e-04 - acc: 1.0000 - auc: 1.0000 - val_loss: 5.2195e-04 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 37/200
249/249 [==============================] - 0s 156us/sample - loss: 5.2600e-04 - acc: 1.0000 - auc: 1.0000 - val_loss: 4.9899e-04 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 38/200
249/249 [==============================] - 0s 157us/sample - loss: 5.0377e-04 - acc: 1.0000 - auc: 1.0000 - val_loss: 4.7909e-04 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 39/200
249/249 [==============================] - 0s 83us/sample - loss: 4.8214e-04 - acc: 1.0000 - auc: 1.0000 - val_loss: 4.6040e-04 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 40/200
249/249 [==============================] - 0s 67us/sample - loss: 4.6031e-04 - acc: 1.0000 - auc: 1.0000 - val_loss: 4.4185e-04 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 41/200
249/249 [==============================] - 0s 98us/sample - loss: 4.4175e-04 - acc: 1.0000 - auc: 1.0000 - val_loss: 4.2416e-04 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 42/200
249/249 [==============================] - 0s 156us/sample - loss: 4.2312e-04 - acc: 1.0000 - auc: 1.0000 - val_loss: 4.0745e-04 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 43/200
249/249 [==============================] - 0s 157us/sample - loss: 4.0703e-04 - acc: 1.0000 - auc: 1.0000 - val_loss: 3.9260e-04 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 44/200
249/249 [==============================] - 0s 155us/sample - loss: 3.9030e-04 - acc: 1.0000 - auc: 1.0000 - val_loss: 3.7829e-04 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 45/200
249/249 [==============================] - 0s 171us/sample - loss: 3.7521e-04 - acc: 1.0000 - auc: 1.0000 - val_loss: 3.6403e-04 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 46/200
249/249 [==============================] - 0s 158us/sample - loss: 3.6179e-04 - acc: 1.0000 - auc: 1.0000 - val_loss: 3.5054e-04 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 47/200
249/249 [==============================] - 0s 157us/sample - loss: 3.4690e-04 - acc: 1.0000 - auc: 1.0000 - val_loss: 3.3890e-04 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 48/200
249/249 [==============================] - 0s 149us/sample - loss: 3.3481e-04 - acc: 1.0000 - auc: 1.0000 - val_loss: 3.2678e-04 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 49/200
249/249 [==============================] - 0s 158us/sample - loss: 3.2331e-04 - acc: 1.0000 - auc: 1.0000 - val_loss: 3.1490e-04 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 50/200
249/249 [==============================] - 0s 161us/sample - loss: 3.1143e-04 - acc: 1.0000 - auc: 1.0000 - val_loss: 3.0451e-04 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 51/200
249/249 [==============================] - 0s 156us/sample - loss: 3.0033e-04 - acc: 1.0000 - auc: 1.0000 - val_loss: 2.9452e-04 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 52/200
249/249 [==============================] - 0s 170us/sample - loss: 2.9001e-04 - acc: 1.0000 - auc: 1.0000 - val_loss: 2.8440e-04 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 53/200
249/249 [==============================] - 0s 152us/sample - loss: 2.8041e-04 - acc: 1.0000 - auc: 1.0000 - val_loss: 2.7550e-04 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 54/200
249/249 [==============================] - 0s 144us/sample - loss: 2.7132e-04 - acc: 1.0000 - auc: 1.0000 - val_loss: 2.6701e-04 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 55/200
249/249 [==============================] - 0s 152us/sample - loss: 2.6241e-04 - acc: 1.0000 - auc: 1.0000 - val_loss: 2.5867e-04 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 56/200
249/249 [==============================] - 0s 161us/sample - loss: 2.5398e-04 - acc: 1.0000 - auc: 1.0000 - val_loss: 2.5105e-04 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 57/200
249/249 [==============================] - 0s 169us/sample - loss: 2.4563e-04 - acc: 1.0000 - auc: 1.0000 - val_loss: 2.4362e-04 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 58/200
249/249 [==============================] - 0s 193us/sample - loss: 2.3808e-04 - acc: 1.0000 - auc: 1.0000 - val_loss: 2.3599e-04 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 59/200
249/249 [==============================] - 0s 165us/sample - loss: 2.3036e-04 - acc: 1.0000 - auc: 1.0000 - val_loss: 2.2889e-04 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 60/200
249/249 [==============================] - 0s 156us/sample - loss: 2.2349e-04 - acc: 1.0000 - auc: 1.0000 - val_loss: 2.2210e-04 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 61/200
249/249 [==============================] - 0s 166us/sample - loss: 2.1666e-04 - acc: 1.0000 - auc: 1.0000 - val_loss: 2.1584e-04 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 62/200
249/249 [==============================] - 0s 153us/sample - loss: 2.1042e-04 - acc: 1.0000 - auc: 1.0000 - val_loss: 2.0974e-04 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 63/200
249/249 [==============================] - 0s 107us/sample - loss: 2.0438e-04 - acc: 1.0000 - auc: 1.0000 - val_loss: 2.0404e-04 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 64/200
249/249 [==============================] - 0s 72us/sample - loss: 1.9802e-04 - acc: 1.0000 - auc: 1.0000 - val_loss: 1.9844e-04 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 65/200
249/249 [==============================] - 0s 65us/sample - loss: 1.9252e-04 - acc: 1.0000 - auc: 1.0000 - val_loss: 1.9317e-04 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 66/200
249/249 [==============================] - 0s 73us/sample - loss: 1.8705e-04 - acc: 1.0000 - auc: 1.0000 - val_loss: 1.8794e-04 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 67/200
249/249 [==============================] - 0s 89us/sample - loss: 1.8190e-04 - acc: 1.0000 - auc: 1.0000 - val_loss: 1.8285e-04 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 68/200
249/249 [==============================] - 0s 109us/sample - loss: 1.7681e-04 - acc: 1.0000 - auc: 1.0000 - val_loss: 1.7833e-04 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 69/200
249/249 [==============================] - 0s 125us/sample - loss: 1.7197e-04 - acc: 1.0000 - auc: 1.0000 - val_loss: 1.7369e-04 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 70/200
249/249 [==============================] - 0s 168us/sample - loss: 1.6736e-04 - acc: 1.0000 - auc: 1.0000 - val_loss: 1.6930e-04 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 71/200
249/249 [==============================] - 0s 148us/sample - loss: 1.6272e-04 - acc: 1.0000 - auc: 1.0000 - val_loss: 1.6479e-04 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 72/200
249/249 [==============================] - 0s 150us/sample - loss: 1.5860e-04 - acc: 1.0000 - auc: 1.0000 - val_loss: 1.6076e-04 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 73/200
249/249 [==============================] - 0s 159us/sample - loss: 1.5434e-04 - acc: 1.0000 - auc: 1.0000 - val_loss: 1.5689e-04 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 74/200
249/249 [==============================] - 0s 164us/sample - loss: 1.5050e-04 - acc: 1.0000 - auc: 1.0000 - val_loss: 1.5317e-04 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 75/200
249/249 [==============================] - 0s 151us/sample - loss: 1.4666e-04 - acc: 1.0000 - auc: 1.0000 - val_loss: 1.4954e-04 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 76/200
249/249 [==============================] - 0s 138us/sample - loss: 1.4282e-04 - acc: 1.0000 - auc: 1.0000 - val_loss: 1.4586e-04 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 77/200
249/249 [==============================] - 0s 140us/sample - loss: 1.3912e-04 - acc: 1.0000 - auc: 1.0000 - val_loss: 1.4230e-04 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 78/200
249/249 [==============================] - 0s 154us/sample - loss: 1.3557e-04 - acc: 1.0000 - auc: 1.0000 - val_loss: 1.3902e-04 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 79/200
249/249 [==============================] - 0s 103us/sample - loss: 1.3225e-04 - acc: 1.0000 - auc: 1.0000 - val_loss: 1.3587e-04 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 80/200
249/249 [==============================] - 0s 72us/sample - loss: 1.2899e-04 - acc: 1.0000 - auc: 1.0000 - val_loss: 1.3284e-04 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 81/200
249/249 [==============================] - 0s 68us/sample - loss: 1.2566e-04 - acc: 1.0000 - auc: 1.0000 - val_loss: 1.2984e-04 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 82/200
249/249 [==============================] - 0s 101us/sample - loss: 1.2259e-04 - acc: 1.0000 - auc: 1.0000 - val_loss: 1.2692e-04 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 83/200
249/249 [==============================] - 0s 88us/sample - loss: 1.1958e-04 - acc: 1.0000 - auc: 1.0000 - val_loss: 1.2401e-04 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 84/200
249/249 [==============================] - 0s 87us/sample - loss: 1.1676e-04 - acc: 1.0000 - auc: 1.0000 - val_loss: 1.2137e-04 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 85/200
249/249 [==============================] - 0s 85us/sample - loss: 1.1391e-04 - acc: 1.0000 - auc: 1.0000 - val_loss: 1.1878e-04 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 86/200
249/249 [==============================] - 0s 96us/sample - loss: 1.1097e-04 - acc: 1.0000 - auc: 1.0000 - val_loss: 1.1623e-04 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 87/200
249/249 [==============================] - 0s 100us/sample - loss: 1.0836e-04 - acc: 1.0000 - auc: 1.0000 - val_loss: 1.1367e-04 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 88/200
249/249 [==============================] - 0s 113us/sample - loss: 1.0569e-04 - acc: 1.0000 - auc: 1.0000 - val_loss: 1.1129e-04 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 89/200
249/249 [==============================] - 0s 112us/sample - loss: 1.0339e-04 - acc: 1.0000 - auc: 1.0000 - val_loss: 1.0890e-04 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 90/200
249/249 [==============================] - 0s 111us/sample - loss: 1.0098e-04 - acc: 1.0000 - auc: 1.0000 - val_loss: 1.0676e-04 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 91/200
249/249 [==============================] - 0s 94us/sample - loss: 9.8582e-05 - acc: 1.0000 - auc: 1.0000 - val_loss: 1.0456e-04 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 92/200
249/249 [==============================] - 0s 108us/sample - loss: 9.6355e-05 - acc: 1.0000 - auc: 1.0000 - val_loss: 1.0241e-04 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 93/200
249/249 [==============================] - 0s 111us/sample - loss: 9.4088e-05 - acc: 1.0000 - auc: 1.0000 - val_loss: 1.0032e-04 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 94/200
249/249 [==============================] - 0s 105us/sample - loss: 9.1962e-05 - acc: 1.0000 - auc: 1.0000 - val_loss: 9.8322e-05 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 95/200
249/249 [==============================] - 0s 95us/sample - loss: 8.9888e-05 - acc: 1.0000 - auc: 1.0000 - val_loss: 9.6460e-05 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 96/200
249/249 [==============================] - 0s 90us/sample - loss: 8.7860e-05 - acc: 1.0000 - auc: 1.0000 - val_loss: 9.4574e-05 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 97/200
249/249 [==============================] - 0s 77us/sample - loss: 8.5908e-05 - acc: 1.0000 - auc: 1.0000 - val_loss: 9.2702e-05 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 98/200
249/249 [==============================] - 0s 61us/sample - loss: 8.4124e-05 - acc: 1.0000 - auc: 1.0000 - val_loss: 9.1059e-05 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 99/200
249/249 [==============================] - 0s 94us/sample - loss: 8.2113e-05 - acc: 1.0000 - auc: 1.0000 - val_loss: 8.9390e-05 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 100/200
249/249 [==============================] - 0s 85us/sample - loss: 8.0454e-05 - acc: 1.0000 - auc: 1.0000 - val_loss: 8.7750e-05 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 101/200
249/249 [==============================] - 0s 99us/sample - loss: 7.8767e-05 - acc: 1.0000 - auc: 1.0000 - val_loss: 8.6057e-05 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 102/200
249/249 [==============================] - 0s 98us/sample - loss: 7.7109e-05 - acc: 1.0000 - auc: 1.0000 - val_loss: 8.4430e-05 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 103/200
249/249 [==============================] - 0s 120us/sample - loss: 7.5409e-05 - acc: 1.0000 - auc: 1.0000 - val_loss: 8.2737e-05 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 104/200
249/249 [==============================] - 0s 114us/sample - loss: 7.3795e-05 - acc: 1.0000 - auc: 1.0000 - val_loss: 8.1208e-05 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 105/200
249/249 [==============================] - 0s 108us/sample - loss: 7.2340e-05 - acc: 1.0000 - auc: 1.0000 - val_loss: 7.9718e-05 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 106/200
249/249 [==============================] - 0s 92us/sample - loss: 7.0775e-05 - acc: 1.0000 - auc: 1.0000 - val_loss: 7.8334e-05 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 107/200
249/249 [==============================] - 0s 107us/sample - loss: 6.9368e-05 - acc: 1.0000 - auc: 1.0000 - val_loss: 7.6937e-05 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 108/200
249/249 [==============================] - 0s 107us/sample - loss: 6.8036e-05 - acc: 1.0000 - auc: 1.0000 - val_loss: 7.5478e-05 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 109/200
249/249 [==============================] - 0s 146us/sample - loss: 6.6568e-05 - acc: 1.0000 - auc: 1.0000 - val_loss: 7.4131e-05 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 110/200
249/249 [==============================] - 0s 172us/sample - loss: 6.5260e-05 - acc: 1.0000 - auc: 1.0000 - val_loss: 7.2767e-05 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 111/200
249/249 [==============================] - 0s 157us/sample - loss: 6.4018e-05 - acc: 1.0000 - auc: 1.0000 - val_loss: 7.1492e-05 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 112/200
249/249 [==============================] - 0s 171us/sample - loss: 6.2741e-05 - acc: 1.0000 - auc: 1.0000 - val_loss: 7.0264e-05 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 113/200
249/249 [==============================] - 0s 78us/sample - loss: 6.1495e-05 - acc: 1.0000 - auc: 1.0000 - val_loss: 6.9051e-05 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 114/200
249/249 [==============================] - 0s 49us/sample - loss: 6.0329e-05 - acc: 1.0000 - auc: 1.0000 - val_loss: 6.7759e-05 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 115/200
249/249 [==============================] - 0s 86us/sample - loss: 5.9190e-05 - acc: 1.0000 - auc: 1.0000 - val_loss: 6.6590e-05 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 116/200
249/249 [==============================] - 0s 90us/sample - loss: 5.7992e-05 - acc: 1.0000 - auc: 1.0000 - val_loss: 6.5493e-05 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 117/200
249/249 [==============================] - 0s 113us/sample - loss: 5.6976e-05 - acc: 1.0000 - auc: 1.0000 - val_loss: 6.4364e-05 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 118/200
249/249 [==============================] - 0s 154us/sample - loss: 5.5889e-05 - acc: 1.0000 - auc: 1.0000 - val_loss: 6.3288e-05 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 119/200
249/249 [==============================] - 0s 147us/sample - loss: 5.4895e-05 - acc: 1.0000 - auc: 1.0000 - val_loss: 6.2217e-05 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 120/200
249/249 [==============================] - 0s 144us/sample - loss: 5.3898e-05 - acc: 1.0000 - auc: 1.0000 - val_loss: 6.1139e-05 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 121/200
249/249 [==============================] - 0s 155us/sample - loss: 5.2835e-05 - acc: 1.0000 - auc: 1.0000 - val_loss: 6.0188e-05 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 122/200
249/249 [==============================] - 0s 153us/sample - loss: 5.1947e-05 - acc: 1.0000 - auc: 1.0000 - val_loss: 5.9233e-05 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 123/200
249/249 [==============================] - 0s 152us/sample - loss: 5.1023e-05 - acc: 1.0000 - auc: 1.0000 - val_loss: 5.8208e-05 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 124/200
249/249 [==============================] - 0s 139us/sample - loss: 5.0092e-05 - acc: 1.0000 - auc: 1.0000 - val_loss: 5.7278e-05 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 125/200
249/249 [==============================] - 0s 151us/sample - loss: 4.9206e-05 - acc: 1.0000 - auc: 1.0000 - val_loss: 5.6332e-05 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 126/200
249/249 [==============================] - 0s 155us/sample - loss: 4.8325e-05 - acc: 1.0000 - auc: 1.0000 - val_loss: 5.5419e-05 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 127/200
249/249 [==============================] - 0s 65us/sample - loss: 4.7528e-05 - acc: 1.0000 - auc: 1.0000 - val_loss: 5.4559e-05 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 128/200
249/249 [==============================] - 0s 75us/sample - loss: 4.6679e-05 - acc: 1.0000 - auc: 1.0000 - val_loss: 5.3726e-05 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 129/200
249/249 [==============================] - 0s 146us/sample - loss: 4.5877e-05 - acc: 1.0000 - auc: 1.0000 - val_loss: 5.2928e-05 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 130/200
249/249 [==============================] - 0s 160us/sample - loss: 4.5126e-05 - acc: 1.0000 - auc: 1.0000 - val_loss: 5.2110e-05 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 131/200
249/249 [==============================] - 0s 152us/sample - loss: 4.4290e-05 - acc: 1.0000 - auc: 1.0000 - val_loss: 5.1310e-05 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 132/200
249/249 [==============================] - 0s 156us/sample - loss: 4.3579e-05 - acc: 1.0000 - auc: 1.0000 - val_loss: 5.0534e-05 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 133/200
249/249 [==============================] - 0s 159us/sample - loss: 4.2869e-05 - acc: 1.0000 - auc: 1.0000 - val_loss: 4.9785e-05 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 134/200
249/249 [==============================] - 0s 170us/sample - loss: 4.2185e-05 - acc: 1.0000 - auc: 1.0000 - val_loss: 4.9054e-05 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 135/200
249/249 [==============================] - 0s 136us/sample - loss: 4.1418e-05 - acc: 1.0000 - auc: 1.0000 - val_loss: 4.8325e-05 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 136/200
249/249 [==============================] - 0s 157us/sample - loss: 4.0776e-05 - acc: 1.0000 - auc: 1.0000 - val_loss: 4.7613e-05 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 137/200
249/249 [==============================] - 0s 166us/sample - loss: 4.0106e-05 - acc: 1.0000 - auc: 1.0000 - val_loss: 4.6922e-05 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 138/200
249/249 [==============================] - 0s 85us/sample - loss: 3.9454e-05 - acc: 1.0000 - auc: 1.0000 - val_loss: 4.6245e-05 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 139/200
249/249 [==============================] - 0s 77us/sample - loss: 3.8824e-05 - acc: 1.0000 - auc: 1.0000 - val_loss: 4.5553e-05 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 140/200
249/249 [==============================] - 0s 129us/sample - loss: 3.8208e-05 - acc: 1.0000 - auc: 1.0000 - val_loss: 4.4885e-05 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 141/200
249/249 [==============================] - 0s 169us/sample - loss: 3.7600e-05 - acc: 1.0000 - auc: 1.0000 - val_loss: 4.4249e-05 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 142/200
249/249 [==============================] - 0s 168us/sample - loss: 3.6997e-05 - acc: 1.0000 - auc: 1.0000 - val_loss: 4.3631e-05 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 143/200
249/249 [==============================] - 0s 160us/sample - loss: 3.6424e-05 - acc: 1.0000 - auc: 1.0000 - val_loss: 4.3025e-05 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 144/200
249/249 [==============================] - 0s 160us/sample - loss: 3.5863e-05 - acc: 1.0000 - auc: 1.0000 - val_loss: 4.2410e-05 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 145/200
249/249 [==============================] - 0s 185us/sample - loss: 3.5292e-05 - acc: 1.0000 - auc: 1.0000 - val_loss: 4.1791e-05 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 146/200
249/249 [==============================] - 0s 182us/sample - loss: 3.4748e-05 - acc: 1.0000 - auc: 1.0000 - val_loss: 4.1206e-05 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 147/200
249/249 [==============================] - 0s 179us/sample - loss: 3.4223e-05 - acc: 1.0000 - auc: 1.0000 - val_loss: 4.0632e-05 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 148/200
249/249 [==============================] - 0s 147us/sample - loss: 3.3702e-05 - acc: 1.0000 - auc: 1.0000 - val_loss: 4.0094e-05 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 149/200
249/249 [==============================] - 0s 90us/sample - loss: 3.3211e-05 - acc: 1.0000 - auc: 1.0000 - val_loss: 3.9550e-05 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 150/200
249/249 [==============================] - 0s 89us/sample - loss: 3.2703e-05 - acc: 1.0000 - auc: 1.0000 - val_loss: 3.9003e-05 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 151/200
249/249 [==============================] - 0s 140us/sample - loss: 3.2215e-05 - acc: 1.0000 - auc: 1.0000 - val_loss: 3.8498e-05 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 152/200
249/249 [==============================] - 0s 154us/sample - loss: 3.1735e-05 - acc: 1.0000 - auc: 1.0000 - val_loss: 3.7989e-05 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 153/200
249/249 [==============================] - 0s 141us/sample - loss: 3.1274e-05 - acc: 1.0000 - auc: 1.0000 - val_loss: 3.7498e-05 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 154/200
249/249 [==============================] - 0s 151us/sample - loss: 3.0815e-05 - acc: 1.0000 - auc: 1.0000 - val_loss: 3.6987e-05 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 155/200
249/249 [==============================] - 0s 130us/sample - loss: 3.0363e-05 - acc: 1.0000 - auc: 1.0000 - val_loss: 3.6505e-05 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 156/200
249/249 [==============================] - 0s 183us/sample - loss: 2.9926e-05 - acc: 1.0000 - auc: 1.0000 - val_loss: 3.6045e-05 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 157/200
249/249 [==============================] - 0s 159us/sample - loss: 2.9489e-05 - acc: 1.0000 - auc: 1.0000 - val_loss: 3.5572e-05 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 158/200
249/249 [==============================] - 0s 166us/sample - loss: 2.9086e-05 - acc: 1.0000 - auc: 1.0000 - val_loss: 3.5122e-05 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 159/200
249/249 [==============================] - 0s 162us/sample - loss: 2.8681e-05 - acc: 1.0000 - auc: 1.0000 - val_loss: 3.4680e-05 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 160/200
249/249 [==============================] - 0s 57us/sample - loss: 2.8258e-05 - acc: 1.0000 - auc: 1.0000 - val_loss: 3.4234e-05 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 161/200
249/249 [==============================] - 0s 91us/sample - loss: 2.7870e-05 - acc: 1.0000 - auc: 1.0000 - val_loss: 3.3801e-05 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 162/200
249/249 [==============================] - 0s 150us/sample - loss: 2.7502e-05 - acc: 1.0000 - auc: 1.0000 - val_loss: 3.3377e-05 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 163/200
249/249 [==============================] - 0s 184us/sample - loss: 2.7104e-05 - acc: 1.0000 - auc: 1.0000 - val_loss: 3.2965e-05 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 164/200
249/249 [==============================] - 0s 168us/sample - loss: 2.6730e-05 - acc: 1.0000 - auc: 1.0000 - val_loss: 3.2550e-05 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 165/200
249/249 [==============================] - 0s 180us/sample - loss: 2.6366e-05 - acc: 1.0000 - auc: 1.0000 - val_loss: 3.2156e-05 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 166/200
249/249 [==============================] - 0s 171us/sample - loss: 2.6013e-05 - acc: 1.0000 - auc: 1.0000 - val_loss: 3.1733e-05 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 167/200
249/249 [==============================] - 0s 182us/sample - loss: 2.5648e-05 - acc: 1.0000 - auc: 1.0000 - val_loss: 3.1356e-05 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 168/200
249/249 [==============================] - 0s 161us/sample - loss: 2.5315e-05 - acc: 1.0000 - auc: 1.0000 - val_loss: 3.0997e-05 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 169/200
249/249 [==============================] - 0s 97us/sample - loss: 2.4986e-05 - acc: 1.0000 - auc: 1.0000 - val_loss: 3.0615e-05 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 170/200
249/249 [==============================] - 0s 120us/sample - loss: 2.4641e-05 - acc: 1.0000 - auc: 1.0000 - val_loss: 3.0236e-05 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 171/200
249/249 [==============================] - 0s 89us/sample - loss: 2.4320e-05 - acc: 1.0000 - auc: 1.0000 - val_loss: 2.9862e-05 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 172/200
249/249 [==============================] - 0s 61us/sample - loss: 2.4003e-05 - acc: 1.0000 - auc: 1.0000 - val_loss: 2.9510e-05 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 173/200
249/249 [==============================] - 0s 134us/sample - loss: 2.3700e-05 - acc: 1.0000 - auc: 1.0000 - val_loss: 2.9170e-05 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 174/200
249/249 [==============================] - 0s 192us/sample - loss: 2.3384e-05 - acc: 1.0000 - auc: 1.0000 - val_loss: 2.8835e-05 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 175/200
249/249 [==============================] - 0s 189us/sample - loss: 2.3071e-05 - acc: 1.0000 - auc: 1.0000 - val_loss: 2.8483e-05 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 176/200
249/249 [==============================] - 0s 173us/sample - loss: 2.2784e-05 - acc: 1.0000 - auc: 1.0000 - val_loss: 2.8157e-05 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 177/200
249/249 [==============================] - 0s 154us/sample - loss: 2.2497e-05 - acc: 1.0000 - auc: 1.0000 - val_loss: 2.7828e-05 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 178/200
249/249 [==============================] - 0s 151us/sample - loss: 2.2212e-05 - acc: 1.0000 - auc: 1.0000 - val_loss: 2.7506e-05 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 179/200
249/249 [==============================] - 0s 196us/sample - loss: 2.1934e-05 - acc: 1.0000 - auc: 1.0000 - val_loss: 2.7205e-05 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 180/200
249/249 [==============================] - 0s 149us/sample - loss: 2.1652e-05 - acc: 1.0000 - auc: 1.0000 - val_loss: 2.6894e-05 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 181/200
249/249 [==============================] - 0s 97us/sample - loss: 2.1394e-05 - acc: 1.0000 - auc: 1.0000 - val_loss: 2.6590e-05 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 182/200
249/249 [==============================] - 0s 131us/sample - loss: 2.1114e-05 - acc: 1.0000 - auc: 1.0000 - val_loss: 2.6300e-05 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 183/200
249/249 [==============================] - 0s 187us/sample - loss: 2.0872e-05 - acc: 1.0000 - auc: 1.0000 - val_loss: 2.5983e-05 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 184/200
249/249 [==============================] - 0s 189us/sample - loss: 2.0615e-05 - acc: 1.0000 - auc: 1.0000 - val_loss: 2.5693e-05 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 185/200
249/249 [==============================] - 0s 170us/sample - loss: 2.0346e-05 - acc: 1.0000 - auc: 1.0000 - val_loss: 2.5414e-05 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 186/200
249/249 [==============================] - 0s 170us/sample - loss: 2.0107e-05 - acc: 1.0000 - auc: 1.0000 - val_loss: 2.5147e-05 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 187/200
249/249 [==============================] - 0s 181us/sample - loss: 1.9862e-05 - acc: 1.0000 - auc: 1.0000 - val_loss: 2.4860e-05 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 188/200
249/249 [==============================] - 0s 185us/sample - loss: 1.9623e-05 - acc: 1.0000 - auc: 1.0000 - val_loss: 2.4589e-05 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 189/200
249/249 [==============================] - 0s 167us/sample - loss: 1.9399e-05 - acc: 1.0000 - auc: 1.0000 - val_loss: 2.4319e-05 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 190/200
249/249 [==============================] - 0s 82us/sample - loss: 1.9170e-05 - acc: 1.0000 - auc: 1.0000 - val_loss: 2.4054e-05 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 191/200
249/249 [==============================] - 0s 83us/sample - loss: 1.8938e-05 - acc: 1.0000 - auc: 1.0000 - val_loss: 2.3803e-05 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 192/200
249/249 [==============================] - 0s 174us/sample - loss: 1.8707e-05 - acc: 1.0000 - auc: 1.0000 - val_loss: 2.3552e-05 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 193/200
249/249 [==============================] - 0s 183us/sample - loss: 1.8496e-05 - acc: 1.0000 - auc: 1.0000 - val_loss: 2.3303e-05 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 194/200
249/249 [==============================] - 0s 194us/sample - loss: 1.8281e-05 - acc: 1.0000 - auc: 1.0000 - val_loss: 2.3060e-05 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 195/200
249/249 [==============================] - 0s 170us/sample - loss: 1.8068e-05 - acc: 1.0000 - auc: 1.0000 - val_loss: 2.2811e-05 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 196/200
249/249 [==============================] - 0s 93us/sample - loss: 1.7864e-05 - acc: 1.0000 - auc: 1.0000 - val_loss: 2.2577e-05 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 197/200
249/249 [==============================] - 0s 82us/sample - loss: 1.7650e-05 - acc: 1.0000 - auc: 1.0000 - val_loss: 2.2334e-05 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 198/200
249/249 [==============================] - 0s 143us/sample - loss: 1.7449e-05 - acc: 1.0000 - auc: 1.0000 - val_loss: 2.2096e-05 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 199/200
249/249 [==============================] - 0s 166us/sample - loss: 1.7254e-05 - acc: 1.0000 - auc: 1.0000 - val_loss: 2.1863e-05 - val_acc: 1.0000 - val_auc: 0.0000e+00
Epoch 200/200
249/249 [==============================] - 0s 186us/sample - loss: 1.7061e-05 - acc: 1.0000 - auc: 1.0000 - val_loss: 2.1633e-05 - val_acc: 1.0000 - val_auc: 0.0000e+00
In [59]:
plt.figure(figsize=(20, 10))

food_epochs_range = range(1, food_epochs + 1)
food_train_loss, food_val_loss = food_history.history['loss'], food_history.history['val_loss']
food_train_auc, food_val_auc = food_history.history['auc'], food_history.history['val_auc']

plt.subplot(1, 2, 1)
plt.plot(food_epochs_range, food_train_loss, label="Training Loss")
plt.plot(food_epochs_range, food_val_loss, label="Validation Loss")
plt.title("Loss")
plt.legend()

plt.subplot(1, 2, 2)
plt.plot(food_epochs_range, food_train_auc, label="Training AUC")
plt.plot(food_epochs_range, food_val_auc, label="Validation AUC")
plt.title("AUC")
plt.legend()

plt.show()

The performance of the model is plotted. As it can be seen

In the first plot the validation loss decreases with the training loss. Both the line alomost lie on top of each each other which shows that the model is training well. As it can be seen ahead that the position at which the loss is minimum is 199. So it cannot be reduced further.

The second graph shows the training AC vs the validation AUC. The training AUC increases initially and then becomes constant. But the validation AUC is constant which show the TPR-FPR (True Positive Rate - False Positive Rate) is constant. In addition the the point at which the value of AUC is maximum is 0.

In [60]:
print(np.argmin(food_val_loss), np.argmax(food_val_auc))
199 0
In [61]:
food_model.evaluate(X_food_test, y_food_test)
77/77 [==============================] - 0s 81us/sample - loss: 0.4889 - acc: 0.8442 - auc: 0.7867
Out[61]:
[0.4888845519586043, 0.84415585, 0.7867347]

The average loss, accuracy and AUC are 0.267, 0.935, 0.89 respectively. The model has a good accuracy and AUC.

In [62]:
# predict probabilities for test set
yhat_probs = food_model.predict(X_food_test, verbose=0)
# predict crisp classes for test set
yhat_classes = food_model.predict_classes(X_food_test, verbose=0)
# confusion matrix
print('\nConfusion Matrix and Classification Report\n')
matrix = confusion_matrix(y_food_test, yhat_classes)
print(matrix)
target_names=['Class 0 - non-vegetarian','Class 1 - vegetarian']
print(classification_report(y_food_test, yhat_classes, target_names=target_names))
Confusion Matrix and Classification Report

[[ 3  4]
 [ 8 62]]
                          precision    recall  f1-score   support

Class 0 - non-vegetarian       0.27      0.43      0.33         7
    Class 1 - vegetarian       0.94      0.89      0.91        70

                accuracy                           0.84        77
               macro avg       0.61      0.66      0.62        77
            weighted avg       0.88      0.84      0.86        77

Model Evaluation

The Neural Network model evaluation With the ingredient vectors is shown below. The Confusion Matrix and Classification Report for the Standard Model gives the following results as seen:

• For Class 0 (Non-Vegetarian), 3 identified correctly 8 identified incorrectly.
• For Class 1 (Vegetarian), 62 identified correctly 4 identified incorrectly.
• True Positives – Number of correctly predicted positive values is 3.
• True Negatives - Number of correctly predicted negative values is 62.
• False Positives – Number of negative values incorrectly predicted as positive values is 8.
• False Negatives – Number of positive values incorrectly predicted as negative is 4.

The model has overall Precision of 88% and overall Accuracy of 84%.
The Precision for Class 0 / Non-Vegetarian is 27% due to large number of False positives.

Logistc Regression With PCA (Principal Component Analysis)

Logistic regression is a classification algorithm used when the dependent variable is binary. Like all regression analyses, the logistic regression is a predictive analysis. Logistic regression is used to describe data and to explain the relationship between one dependent binary variable and one or more nominal, ordinal, interval independent variables.

There are 405 features in the model which might affect the model due to high dimensionality. Therefore, for dimensionality reduction PCA(Principal Component analysis) is conducted. Principal component analysis computes a new set of variables (principal components) and expresses the data in terms of these new variables.

In [63]:
from sklearn.decomposition import PCA

pca_none = PCA(n_components=None,random_state=100)
X_pca = pca_none.fit(X_food, y)
pca_var_ratios = pca_none.explained_variance_ratio_

# Create a function
def select_n_components(var_ratio, goal_var: float) -> int:
    total_variance = 0.0
    n_components = 0
    
    for explained_variance in var_ratio:
        total_variance += explained_variance
        n_components += 1
        if total_variance >= goal_var:
            break
            
    return n_components


n_comppca=select_n_components(pca_var_ratios, 0.95)
In [64]:
print(n_comppca)
175
In [65]:
pca = PCA(n_components=n_comppca,svd_solver='full')
transformed_data = pca.fit_transform(X_food)

x_train_log,x_test_log,y_train_log,y_test_log = train_test_split(X_food, y, test_size = 0.2, random_state = 100)
x_train_log_smt,y_train_log_smt = smt.fit_resample(x_train_log,y_train_log)

Train Test split is used to create the training set. The dataset used is with the ingredient vector dataframe appended to the other normalized and encoded features. SMOTE analysis is used to comensate the under-represented class's data points. After SMOTE we have 182 data points for both classes.

In [66]:
print('Train Data - Class Split')
Outcome_0= (y_train_log_smt == 0).sum()
Outcome_1 = (y_train_log_smt == 1).sum()
print('Class 0 (Non Vegetarian)-',  Outcome_0)
print('Class 1 (Vegetarian)-',  Outcome_1)
Train Data - Class Split
Class 0 (Non Vegetarian)- 182
Class 1 (Vegetarian)- 182
In [67]:
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, confusion_matrix

logreg = LogisticRegression(solver='liblinear', class_weight='balanced', random_state=100)
logreg.fit(x_train_log_smt,y_train_log_smt)
y_pred = logreg.predict(x_test_log)
target_names=['Class 0 - non-vegetarian','Class 1 - vegetarian']
print('\nNumber of PCA components:',n_comppca)
print('\nConfusion Matrix and Classification Report')
print('\n', confusion_matrix(y_test_log,y_pred))  
print('\n',classification_report(y_test_log,y_pred,target_names=target_names))
print('\n')
Number of PCA components: 175

Confusion Matrix and Classification Report

 [[ 7  0]
 [17 27]]

                           precision    recall  f1-score   support

Class 0 - non-vegetarian       0.29      1.00      0.45         7
    Class 1 - vegetarian       1.00      0.61      0.76        44

                accuracy                           0.67        51
               macro avg       0.65      0.81      0.61        51
            weighted avg       0.90      0.67      0.72        51



Model Evaluation

The Confusion Matrix and Classification Report for the Standard Model gives the following results as seen:

• For Class 0 (Non-Vegetarian), 7 identified correctly 17 identified incorrectly.
• For Class 1 (Vegetarian), 27 identified correctly 0 identified incorrectly.
• True Positives – Number of correctly predicted positive values is 7.
• True Negatives - Number of correctly predicted negative values is 27.
• False Positives – Number of negative values incorrectly predicted as positive values is 17.
• False Negatives – Number of positive values incorrectly predicted as negative is 0.

The model has overall Precision of 90% and overall Accuracy of 67%
The Precision for Class 0 / Non-Vegetarian is 29% due to large number of False positives.

In [68]:
from sklearn.metrics import roc_curve, roc_auc_score
fpr, tpr, threshold = roc_curve(y_test_log, y_pred)
auc_score = roc_auc_score(y_test_log, y_pred)
print('ROC Curve')
#Plot the ROC Curve
plt.title('Receiver Operating Characteristic')
plt.plot(fpr, tpr, 'b', label = 'AUC = %0.2f' % auc_score)
plt.legend(loc = 'lower right')
plt.plot([0, 1], [0, 1],'r--')
plt.ylabel('True Positive Rate')
plt.xlabel('False Positive Rate')
plt.show()
ROC Curve

The Receiver Operator Characteristic (ROC) curve is an evaluation metric for binary classification problems. It is a probability curve that plots the TPR against FPR at various threshold values. The AUC score the model is 81% which is a good score as it explains that the model is 81% accurate in distinguishing between the positive and negative classes.

Insights

The dataset used has some constraints as there are only 255 records with an imbalance in the number of samples provided. More balanced data could be used to better predict and answer the problem statement. The results of the model are able to classify the Vegetarian dishes with greater accuracy because of the bias

The dataset was a clean dataset which did not require synthetics values being added to which is a good sign.

There exsists a high multi-collinearity among the food vectors as evident from the heatmap. This showed that some of the dishes are hightly similar to the other dishes All all the features are kept for the model analysis and overall dimensionality reduction is conducted which produces good results for Logistic regression. For the Nerural Network model the Input layer shape is changed as per the data diensions

A acuuracy of 100% for Class 1 (Logistic Regression) and 95% for Class 1 (Neural network) show that the model works well for the study under consideration.

According to the problem statement the model works well classifying the Vegetarian dishes but not aacurate enough for non - vegetarian dishes because of the high imbalance.

In [69]:
import pickle
import joblib
capstone_1002_lg_model = 'capstone_1002_lg_model.pkl'
capstone_1002_lg_model_joblib = 'capstone_1002_lg_model.sav'

pickle.dump(logreg, open(capstone_1002_lg_model, 'wb'))
joblib.dump(logreg, capstone_1002_lg_model_joblib)
In [70]:
food_model.save("capstone_1002_tf_model.h5")
In [ ]: